zhouhj1994 / LinDA

33 stars 3 forks source link

Covariate categories disappear in output #1

Closed marastadler closed 1 year ago

marastadler commented 2 years ago

Hi, I just started using linda and it is really a cool alternative to ancombc! When adding a covariate (to be adjusted) with 20 categories, some categories ("Cocaging17", "Cocaging19", "Cocaging20") disappear in the output. Do you have any idea why this is happening?

This is the output that I get from linda.obj$variables with formula = '~Genotype+TC+Experiment+Cocaging':

[1] "Genotype2Mut" "TC1"          "Experiment2"  "Experiment3" 
[5] "Cocaging02"   "Cocaging03"   "Cocaging04"   "Cocaging05"  
[9] "Cocaging06"   "Cocaging07"   "Cocaging08"   "Cocaging09"  
[13] "Cocaging10"   "Cocaging11"   "Cocaging12"   "Cocaging13"  
[17] "Cocaging14"   "Cocaging15"   "Cocaging16"   "Cocaging18"

Best wishes Mara

zhouhj1994 commented 2 years ago

Hi Mara,

Thanks for your interest in our package. I think this output should be the result of one or some of the following:

  1. There are NAs in the variables (Genotype+TC+Experiment+Cocaging), so the corresponding samples are removed and all samples with categories ( "Cocaging17", "Cocaging19", "Cocaging20") are deleted.

  2. If the argument "lib.cut" was set not to be 0, then some samples might have been removed.

  3. Those disappeared categories are perfectly collinear with some others, so the estimation for these variables returns NA and they are not included in the final result.

You can check the otu table and meta table that were actually used in the model by linda.obj$otu.tab.use and linda.obj$meta.use to confirm which samples and what categories were actually involved.

Please let me know whether any of the above reasons explains the output. Thanks!

Best, Huijuan

On Mon, Jul 26, 2021 at 3:25 PM marastadler @.***> wrote:

Hi, I just started using linda and it is really a cool alternative to ancombc! When adding a covariate (to be adjusted) with 20 categories, some categories ("Cocaging17", "Cocaging19", "Cocaging20") disappear in the output. Do you have any idea why this is happening?

This is the output that I get from linda.obj$variables with formula = '~Genotype+TC+Experiment+Cocaging':

[1] "Genotype2Mut" "TC1" "Experiment2" "Experiment3" [5] "Cocaging02" "Cocaging03" "Cocaging04" "Cocaging05" [9] "Cocaging06" "Cocaging07" "Cocaging08" "Cocaging09" [13] "Cocaging10" "Cocaging11" "Cocaging12" "Cocaging13" [17] "Cocaging14" "Cocaging15" "Cocaging16" "Cocaging18"

Best wishes Mara

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhouhj1994/LinDA/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF4E4D7GZSU7WYCC5QJE5TTZUEYPANCNFSM5A7QRAKA .

marastadler commented 2 years ago

Hi Huijuan,

Many thanks for your answer! I had a look at my data again and it actually seems to be 3., since cocagang and the experiment number are always the same for the three categories.

Best wishes Mara