Closed mradz19 closed 2 years ago
Hi Michael,
When you download from NCBI, are you downloading the non-redundant proteins database, or the full bacteria one? The one provided with SAMSA2 is the non-redundant proteins.
Best, Sam
On Sat, May 9, 2020 at 12:26 AM Michael notifications@github.com wrote:
I have run the pipeline with two refseq databases, one that is downloaded with the samsa2 scripts: " https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/RefSeq_bac.fa
and the other I downloaded from: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria/
I get different results when using the two databases and I noticed the one from ncbi is twice the size, so what is the difference between the two?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/47, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWPTVRBIN7G3DLETWIC7X3RQUAQXANCNFSM4M4V4VJQ .
-- Sam Westreich Microbiome Scientist, DNAnexus, http://www.mosaicbiome.com
Hi @transcript
I downloaded the entire database but only used the non-redundant faa files to make the diamond database.
Just curious how old is the one provided with SAMSA, is it updated regularly?
Hi Michael,
Interesting - I know that the version distributed with SAMSA2 is from 2017 (as that's when we released it), but I suspect it's not being updated regularly. I should probably write up instructions for users to do so, in case I can't consistently update.
On Mon, May 11, 2020 at 2:49 PM Michael notifications@github.com wrote:
Hi @transcript https://github.com/transcript
I downloaded the entire database but only used the non-redundant faa files to make the diamond database.
Just curious how old is the one provided with SAMSA, is it updated regularly?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/47#issuecomment-626984056, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWPTVV5CGNPKL3OKBZGFT3RRBXG5ANCNFSM4M4V4VJQ .
-- Sam Westreich Microbiome Scientist, DNAnexus, http://www.mosaicbiome.com
Hi Sam,
It's weird, I definitely only used the non-redundant files. What is weird is the SAMSA fa file is 28GB where as the downloaded file is 58GB.
I have run the pipeline with two refseq databases, one that is downloaded with the samsa2 scripts: "https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/RefSeq_bac.fa
and the other I downloaded from: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria/
I get different results when using the two databases and I noticed the one from ncbi is twice the size, so what is the difference between the two?