transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Information regarding New available databases - Refseq and SEED database #84

Closed meghahs closed 5 months ago

meghahs commented 5 months ago

Dear Samsa2 team,

I wanted to know for the functional classification using the SEEd subsytem database, the automatic full database download link was updated in 2017. Can it be still used or is there any updated version available?. Also for the NCBI Ref_Seq there is updated reference available January 16, 2024 RefSeq Release 222 is available for FTP. Can These be updated for new analysis? Will the old SEED database be compatable with the new NCBI RefSeq database? Kindly do clarify Thank you

Regards Megha

transcript commented 5 months ago

Hi Megha,

SEED: Unfortunately, the SEED Consortium is no longer pushing any updates. You can see more about the group here: https://www.theseed.org/wiki/Home_of_the_SEED as part of the RAST group https://rast.nmpdr.org/

If you need a more up-to-date pipeline, I might suggest taking a look at MEGAN6, which also offers functional analysis pulled from InterPro2GO, eggNOG or KEGG.

RefSeq: Yes, you should be able to update to the latest RefSeq version. You can pull the FAA files from NCBI's FTP site, https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ , and then merge them together and run the DIAMOND makedb command to create a DIAMOND database from the FAA.

Best, Sam

meghahs commented 5 months ago

Hello Sam,

Thank you for clarifying. I will look at the options suggested by you. I have analyzed using the daa files generated from SAMSA and later analyzing that in MEGAN6. I was just wondering if that would be the latest version. I was going to communicate my results and was just making sure if it is acceptable since the analysis was done last year. Thank you for responding.

Best Megha

On Wed, Feb 7, 2024 at 11:10 AM Sam Westreich @.***> wrote:

Hi Megha,

SEED: Unfortunately, the SEED Consortium is no longer pushing any updates. You can see more about the group here: https://www.theseed.org/wiki/Home_of_the_SEED as part of the RAST group https://rast.nmpdr.org/

If you need a more up-to-date pipeline, I might suggest taking a look at MEGAN6 https://uni-tuebingen.de/en/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/megan6/, which also offers functional analysis pulled from InterPro2GO, eggNOG or KEGG.

RefSeq: Yes, you should be able to update to the latest RefSeq version. You can pull the FAA files from NCBI's FTP site, https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ , and then merge them together and run the DIAMOND makedb command to create a DIAMOND database from the FAA.

  • Note 1: the full database will be well over 100 Gb, so make sure you're doing this on an instance or location with plenty of disk space.
  • Note 2: DIAMOND annotates against a protein database, so you'd need the *protein.faa.gz file for each organism.

Best, Sam

— Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/84#issuecomment-1932381578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJH3QGICHVPZEODGCLMTWKDYSORNVAVCNFSM6AAAAABC2464X6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGM4DCNJXHA . You are receiving this because you authored the thread.Message ID: @.***>

-- Regards,

Megha H.Sampangi-Ramaiah, Ph.D Postdoctoral Fellow Department of Biology, Myer's Hall Indiana University Bloomington, USA