treangenlab / emu

MIT License
34 stars 5 forks source link

Would you recommend subsampling reads across samples? #30

Open ayamakawa opened 1 month ago

ayamakawa commented 1 month ago

Hello! Thank you for the tool and all the support!

I'm trying to compare an ONT 16S sequencing with V4 Illumina using EMU and different databases. I was wondering about subsampling and how this could interfered with EMU and following analyses. We have samples with quite different amounts of reads. Our min is 10k, however, we have samples with above 30k (for both full length 16S and V4 region). Would be more reliable to subsample all samples to the min 10k reads before to compare abundance and diversity?

Thank you! Ana

kdc10 commented 3 weeks ago

This would be up to you and your study design. I, personally, am generally opposed to downsampling as it removes potentially informative data. You could do a quick diversity analysis between the small and large samples to see if it the number of reads seems to skew the data somehow.