syspremed / exploBATCH

A package for discovering and correcting for batch effect using an approach in Nyamundanda et al (2017).
4 stars 1 forks source link

Batch effect not detected but shows being significant in SPLS-DA and PLS-DA models #7

Open ekopylova opened 3 years ago

ekopylova commented 3 years ago

Hello!

SPLS-DA and PLS-DA models built on our data + possible known batch effect show the models can accurately classify samples based to one of two batches (pR2Y = 0.05 & pQ2 = 0.05 based on 20 random permutations of response labels to estimate R2Y and Q2Y significance). However, expBATCH finished after findBATCH and I'm assuming because the findBATCH function reported the batch as not significant. Could you comment on whether we should still remove the batch, even though it's not classified as significant by findBATCH ?

Thanks! Jenya

gnyamundanda commented 3 years ago

Hi Jenya,

You assumptions are correct that if correctBatch has no files, then there are not potential batches. You can check this using the CI plot from FindBatch folder.

Hari cc'd could help you further.

Can you share a snapshot of FindBatch results for Hari to help?

Thanks for your interest,

Anguraj

On Thu, Feb 25, 2021 at 11:02 AM Evguenia Kopylova notifications@github.com wrote:

Hello!

SPLS-DA and PLS-DA models built on our data + possible known batch effect show the models can accurately classify samples based to one of two batches (pR2Y = 0.05 & pQ2 = 0.05 based on 20 random permutations of response labels to estimate R2Y and Q2Y significance). However, expBATCH finished after findBATCH and I'm assuming because the findBATCH function reported the batch as not significant. Could you comment on whether we should still remove the batch, even though it's not classified as significant by findBATCH ?

Thanks! Jenya

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/syspremed/exploBATCH/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEXFITTCTQE6ES4C7KLNBLTAYU3JANCNFSM4YGJFGPQ .

ekopylova commented 3 years ago

Thanks @gnyamundanda for the quick response!

Running PLS-DA using the batch as the response (through ropls package), I obtain the following results:

PLS-DA
60 samples x 700 variables and 1 response
standard scaling of predictors and response(s)
2 excluded variables (near zero variance)
      R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y  pQ2
Total   0.0858    0.948   0.565 0.102   2   0 0.05 0.05

However, running the same data through expBATCH, there appears not to be a significant effect:

batchEffect.txt

"Effect"    "LowerCI"   "UpperCI"
"pPC-1" -0.076615559930736  -1.09629147519899   0.943060355337517
"pPC-2" -3.14523049034484   -7.1707748451274    0.880313864437718

BioEFFECT.txt

"Effect"    "LowerCI"   "UpperCI"
"pPC-1" -3.30722887887888   -5.19626149735946   -1.41819626039831
"pPC-2" 0.961300863108799   -0.0199433882563451 1.94254511447394
ekopylova commented 3 years ago

To add, does it matter if the feature table we are working with is sparse? This is microbiome shotgun data, rather than gene expression, and includes many 0's. Thanks.