qunfengdong / BLCA

34 stars 12 forks source link

BLCA with Silva 23S LSU database? #15

Closed wolfgangrumpf closed 5 years ago

wolfgangrumpf commented 5 years ago

I'd be interested in knowing how I could create a BLCA-compatible database from the SILVA 23S LSU data - is this possible?

qunfengdong commented 5 years ago

We have never tried that. The only challenge is whether you can obtain NCBI-format taxonomic information for the SILA 23S LSU sequences. Once you can do that, BLCA should work.

On Mon, Mar 11, 2019 at 1:44 PM Wolfgang Rumpf notifications@github.com wrote:

I'd be interested in knowing how I could create a BLCA-compatible database from the SILVA 23S LSU data - is this possible?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/ARwbk91HV84tP2gblvCOFtMr8RIJULwPks5vVqP0gaJpZM4bpTvS .

dswan commented 5 years ago

It is possible to do this. SILVA preserves the original sequence ID, so you can remap back to the NCBI taxonomy based on this, or just the species name with something like the ETE toolkit

qunfengdong commented 5 years ago

That's excellent suggestion. We will get it done. Any further guidance will be much appreciated.

On Tue, May 28, 2019 at 11:56 AM Dr. Daniel Swan notifications@github.com wrote:

It is possible to do this. SILVA preserves the original sequence ID, so you can remap back to the NCBI taxonomy based on this, or just the species name with something like the ETE toolkit

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/15?email_source=notifications&email_token=AEOBXE6ND2MZBBHUTR7SFTTPXVP2LA5CNFSM4G5FHPJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMYL7A#issuecomment-496600572, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOBXE4T775U2RDTCJFWYD3PXVP2LANCNFSM4G5FHPJA .

dswan commented 5 years ago

I'd be interested in knowing how I could create a BLCA-compatible database from the SILVA 23S LSU data - is this possible?

Very quick and dirty:

https://drive.google.com/drive/folders/1t0TzC08y7_LyglsdihaXu27oWr7PiKLe

Took the LSU database (SILVA_132_LSURef_tax_silva.fasta), parsed fasta headers for species name, remapped against NCBI taxonomy using the ete3 toolkit, written out in BLCA format, backtranscribed RNA entries to DNA for the fasta file. Kept the SILVA sequence ID for the fasta header.

You'll need to:

makeblastdb -in SILVA_132_LSURef_tax_silva_BLCAparsed.fasta -dbtype nucl -parse_seqids -out SILVA_132_LSURef_tax_silva_BLCAparsed.fasta

There's lots of caveats, because the species name matching is imperfect and misses out chunks of things that the NCBI taxonomy can't handle, and this is entirely untested, and therefore use it at your own risk!

qunfengdong commented 5 years ago

Very helpful! Thanks so much! We will give test it out and update github instructions accordingly.

On Wed, May 29, 2019 at 8:07 AM Dr. Daniel Swan notifications@github.com wrote:

I'd be interested in knowing how I could create a BLCA-compatible database from the SILVA 23S LSU data - is this possible?

Very quick and dirty:

https://drive.google.com/drive/folders/1t0TzC08y7_LyglsdihaXu27oWr7PiKLe

Took the LSU database (SILVA_132_LSURef_tax_silva.fasta), parsed fasta headers for species name, remapped against NCBI taxonomy using the ete3 toolkit, written out in BLCA format, backtranscribed RNA entries to DNA for the fasta file.

You'll need to:

makeblastdb -in SILVA_132_LSURef_tax_silva_BLCAparsed.fasta -dbtype nucl -parse_seqids -out SILVA_132_LSURef_tax_silva_BLCAparsed.fasta

There's lots of caveats, because the species name matching is imperfect and misses out chunks of things that the NCBI taxonomy can't handle, and this is entirely untested, and therefore use it at your own risk!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qunfengdong/BLCA/issues/15?email_source=notifications&email_token=AEOBXE5ITUFO753AHBPPYVDPXZ5ZBA5CNFSM4G5FHPJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWPIO4I#issuecomment-496928625, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOBXE5JRWOQWOVEZWPCGPLPXZ5ZBANCNFSM4G5FHPJA .