jcmcnch / eASV-pipeline-for-515Y-926R

This is a collection of scripts for analyzing mixed 16S/18S amplicon sequences using tools such as qiime2, DADA2, deblur, and bbtools
GNU General Public License v3.0
28 stars 8 forks source link

About making new SILVA_PROK.cdhit95pc and SILVA_and_PR2_EUK.cdhit95pc database for bbsplit #6

Closed shanexuuu closed 4 months ago

shanexuuu commented 5 months ago

Hi Jesse,

Thanks for your pipeline! I am still learning how to use it. I would like to use new released Silva 138.1 and PR2 5.0.0 for bbsplit step. I was wondering how did you create these files: SILVA_132_and_PR2_EUK.cdhit95pc.fasta, SILVA_132_PROK.cdhit95pc.fasta SILVA_132_BACT.cdhit95pc.fasta, SILVA_132_ARCH.cdhit95pc.fasta SILVA_132_BACT-NON-CYANO.cdhit95pc.fasta, SILVA_132_BACT-CYANO.cdhit95pc.fasta

Also would you think is it worthy to using new released Silva 138.1 and PR2 5.0.0 for bbsplit step?

Many thanks!

MDHDZ91 commented 5 months ago

Hi Shanexuuu,

I am working on this same analysis and messaged Jesse just this morning. My code is 90% done and when I finish I'll do a pull request and share. But there are the steps for now:

  1. Make the Silva138 and PR2 databases using the repo Jesse shared
  2. From there you (we) need to make the SILVA_132_and_PR2_EUK.cdhit95pc.fasta, SILVA_132_PROK.cdhit95pc.fasta and then combine to make the EUK_PRO database
  3. To do this clean the Silva and PR2 fasta files (from your classifier.qza) such that you have: Silva 138 Euks, Silva 138 Prok, PR2 Euks (3 files)
  4. use readlength.sh to look at your reads and determine cut-offs to exclude sequences that are too long or too short - based on the histogram output
  5. filter based on length with reformat.sh
  6. use cd-hit to merge on 95% identity- each of your 3 files
  7. combine the new EUK files together -- you now have 2 files one for Euks and one for Prok
  8. run bbsplit.sh using the Silva_PR2_EUKS and Silva_PROK to make the database

Hope that helps!

shanexuuu commented 5 months ago

Hi María,

Thanks for the reply! It is quite helpful for me. I will try to do this. Also looking forward to your result.

Cheers, Shane

MDHDZ91 commented 5 months ago

I have my updated code in my branch of the database repo database construction in case you want to take a look. I made a pull request and hopefully Jesse can merge it soon.

shanexuuu commented 4 months ago

Hi Maria,

Thanks for this. I made a new version of bbsplit db as you decribed. Now I have assigned ~3 times more 18S read than before. THX A LOT!