Closed sh4nth closed 2 years ago
I couldn't find the source for your original 16S data. Is this from your own research group or some other public data? I have retained all the taxa that were present in your original dataset but not there in the rrnDB release, but I'm curious where that is from.
Thanks for creating and maintaining this project!
Hi @sh4nth, Thank you for contributing to this project. For the 16S rRNA database, earlier we planned to include one of the existing databases like SILVA, GreenGenes, RDP or Eztaxon. But we found that each database has its own limitations. When we benchmark our tool using one of the existing databases, only a small proportion of query sequences was observed to map on these databases. Hence, we developed one consolidated database by merging and clustering all sequences from these databases.
For 16S rRNA gene copy numbers we just counted the number of 16S rRNA genes present in each genome of our reference sets.
Thank you for adding data from rrnDB. This will expand the copy number data as well. If you want any other information, please feel free to reply.
Can you just add "_" in the organism name in the new rrnDB file? In previous db files, I have followed the pattern like "genus_species_subspeciesstrain". For making genus-level copy numbers, I have applied split on basis of "" in the main MicFunPred script.