metagenlab / zAMP

zAMP is a bioinformatic pipeline designed for convenient, reproducible and scalable amplicon-based metagenomics
https://zamp.readthedocs.io/en/latest/
MIT License
7 stars 4 forks source link

Database processing #6

Open valscherz opened 5 years ago

valscherz commented 5 years ago

Regarding the database "my" version of the EzBioCloud databse was prepared as indicated here. In the future, we would need to have a script capable of:

valscherz commented 5 years ago

Hi @cbertell,

Could you please confirm that these are supposed to help us in doing what we want here to do i.e. generating a non-redundant reference database? meta_suez_process_otus_November2017.txt meta_suez_process_rdp_tax_November2017.txt

@JULIETCharlotte. We can also build a solution from zero. To put you on track, here is one idea I came up with :

cbertell commented 5 years ago

Those were not the appropriate files. All correct files are now in the R&D folder project number 100.

valscherz commented 5 years ago

We have observed with @JULIETCharlotte that in the current version of the EzBioCloud DB, there are no sequences assigned as eukaryotic, in contrary to the previous iteration of the database which had some. Therefore, we have decided to write to the authors:

[...] Here are our observations:

In a version of the database downloaded in 2018/06 on your website (in copy), there were several sequences tagged as "Eukarya". Looking now at the version currently downloadable on your website, there are not a single Eukarya in the taxonomy file... And it looks as if identical numbers in the database are now identified as Bacteria.

Could you please explain the rational behind these changes ? We have searched for any indications regarding the disappearance of these Eukarya in all release notes available on your website without success. [...]

Here is their answer:

[···] Your observations are correct: We used to provide our QIIME and mothur 16S DB with Eukaryotic information. However, with our more recent updates, we've consciously retracted Eukaryotic information.

The reason for this is because EzBioCloud's original intention was to serve prokaryotic taxonomies. And in order to improve our service, as well as maintain an updated prokaryotic DB, we have removed Eukaryotic information to focus our efforts on only prokaryotic taxonomies.

If you have any further questions, please do not hesitate to contact us.

Best, The EzBioCloud Team"

This choice seems a bit odd to me. We would need to confirm that with @JULIETCharlotte but, in our first check, we faced identical sequences once identifed as Eukarya that became Bacteria in the last version of the DB. I am wondering if there are Eukaryotic sequences amplified by the V3V4 PCR which will be mixed with the rest of the sequences with a Bacterial classification. That's now a significant difference with the Silva database (https://www.arb-silva.de/download/arb-files/).