Questions about using cayman

Xinpeng021001 commented 2 months ago

Hi,

Thank you for the excellent tool! I'm trying to use it based on the biorxiv paper but I have some questions:

for the bwa index, I noticed that in the paper you mentioned you used the non-human-gut dataset but in the zenodo, I found the gut dataset also.
Using bwa to create the index is quite slow, so should we create the individual index for each dataset or combine those to create a total index?
could you please provide the code to plot in the paper? the link https://git.embl.de/grp-zeller/cazy_gut_microbiome/ can't be opened.

Thank you for your time and help!

Best Regards, Xinpeng

cschu commented 2 months ago

Dear Xinpeng,

Thank you for your interest in cayman!

Indeed, the non human-gut catalogues were annotated in addition to the human gut one.
Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.
Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best, Christian

Xinpeng021001 commented 2 months ago

Dear Xinpeng,

Thank you for your interest in cayman!

Indeed, the non human-gut catalogues were annotated in addition to the human gut one.

Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.

Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best, Christian

Dear Christian,

Thank you for your reply! So should we use the non-human gut to make the index or for different environments you recommend we use different catalogs? For example, if I’m trying to annotate a human gut env, should I follow the paper to use the non-human gut catalogue or just use the annotated human gut catalogue? For other envs also the same question. Thank you for your reply!

Best Regards, Xinpeng

cschu commented 2 months ago

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best, Christian

Xinpeng021001 commented 2 months ago

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best, Christian

Dear Christian,

Thank you for your reply! I’ll redo the index part. Thank you!

Best Regards, Xinpeng

zellerlab / cayman

Questions about using cayman #8