wejlab / MetaScope

An R-based approach for preprocessing and aligning 16S, metagenomic, and metatranscriptomic data (PathoScope version 3.0)
GNU General Public License v3.0
16 stars 7 forks source link

Using another reference database #16

Closed ecastron closed 10 months ago

ecastron commented 2 years ago

Dear dev team,

Thanks for putting together an expanded version of pathoscope! I wanted to ask whether it was possible to use a database other than refseq. I usually work on environmental microbiomes and end up mapping against collections of metagenome-assembled genomes and/or the Genome Taxonomy Database (its taxonomy is the best; https://gtdb.ecogenomic.org).

Best,

Eduardo

aubreyodom commented 2 years ago

Hi Eduardo!

Yes, it is possible. If you have fasta files from those databases with the genomes that you're aligning against (compressed or uncompressed both work) then you can use mk_bowtie_index() to compile a bowtie index from the folder of fasta files. Alternatively, you can make Rsubread indices with mk_subread_index() if you want to use the subread aligner (but Bowtie 2 generally is more accurate). If you're having any trouble getting the databases to be compatible, let me know and I can take a look.

Brie

ecastron commented 2 years ago

Hi Brie,

Thanks for the quick answer. That sounds great, I'd definitely give it a shot.

Eduardo

bheimbu commented 1 year ago

Hi @aubreyodom,

do you have any expericence with using sequence data from EnsemblBacteria? I've tried to use this database, but without any luck.

Cheers Bastian

aubreyodom commented 1 year ago

Hi Bastian,

Sadly, I do not have experience with EnsemblBacteria. In the past, we've found that Silva is a good 16S database that is comparable to RefSeq's performance:

https://www.biorxiv.org/content/10.1101/2022.07.27.501757v1

The indices we used are linked on this page: https://github.com/aubreyodom/16SBenchmarking

bheimbu commented 1 year ago

Hi @aubreyodom ,

thanks for letting me know. I'll try Silva.

Cheers Bastian

aubreyodom commented 10 months ago

Just an update for folks wanting to use another reference database - we are actively working on this issue and should have an update for metascope_id in the coming months (if not sooner). I'm particularly interested in trying out Greengenes2 and Silva myself. Stay tuned.

susheelbhanu commented 10 months ago

Thanks @aubreyodom. Looking forward to trying it out!

aubreyodom commented 10 months ago

@susheelbhanu A fix is now live on github (and should go live with the next Bioc release at the end of the month). Check out the db and db_feature_table parameters in the help docs, and let me know if you have any questions or specific cases. I'm going to go ahead and close this issue, but if something pops up feel free to reopen it.

susheelbhanu commented 10 months ago

Awesome, thank you 🙏!