Recommendations - Githubissues

raw-lab / mercat2

MerCat2: python code for versatile k-mer counting and diversity estimation for database independent property analysis for metaome data

https://github.com/raw-lab/mercat2/

BSD 3-Clause "New" or "Revised" License

11 stars 1 forks source link

Recommendations #2

Closed wchow closed 2 years ago

wchow commented 2 years ago

Hi @raw-lab @raw937

Thanks for your previous answer regarding the difference between mercat and mercat2. I was wondering if you have any recommendations on:

Number of input reads to use (I have illumina reads and I've tried downsample 100k, 1M, 5M...etc)
kmer length (I've used k=21, which depending on input can impact speed of running)
I was thinking of using mercat2 to quickly calculate the shannon/simpsons index as a way to compare diversity between samples (running mercat individually on each sample). I was wondering if that can be used that way, as a way to compare between samples.

thanks again for your help!

Will

raw-lab commented 2 years ago

Hello Will,

We have removed the diversity estimation in mercat2 currently. We are working to improve the indexing so we can get a better species count with unique kmers.

If you run mercat2 you can compare and run many samples at once an build a global comparison. Also, this will plot a PCA for you at the end. You can use all your reads this way. You don't have to downsample. Let us know if you lacking in cpu power we can help.

What question are you asking? If it's just differences between samples?

raw-lab commented 2 years ago

If you run in protein mode it's much faster.

wchow commented 2 years ago

Thanks @raw-lab,

I just want to see if I can estimate diversity on reads between samples before actually trying to assemble them. Practically I want some relative metric so I can have a heads up before running more downstream analysis, and also how much compute I need to allocate.

thanks again for your help.

raw-lab commented 2 years ago

We have the new version MerCat2 if you would like.