roseedwin / Benchmarking-taxonomic-classifiers-for-soil-shotgun-data

Benchmarking various taxonomic classifiers for profiling soil metagenomic data, with scripts and tools tailored for fungal, bacterial, and archaeal genome identification.
2 stars 0 forks source link

Availability of the custom kraken DB #1

Open EorgeKit opened 4 months ago

EorgeKit commented 4 months ago

Hi @Meghana9854 @roseedwin , I just came across your amazing paper, and I wanted to implement the approach you proposed for my soil shotgun data. Looking upon your methodology, it seem you relied on a custom databasethat you created. Howeverr I can not seem to find the link pointing to the database itself. Please advise

roseedwin commented 4 months ago

Hi George,

Thank you for your interest in implementing our approach. Unfortunately, we did not upload the database itself to a public platform. However, creating the Kraken2 database on your own is quite straightforward, though it does require approximately 350 GB of RAM memory to both create and utilize the database effectively.

Additionally, since the publication of our work, there has been a new release of the GTDB that you might consider integrating when creating your database. This could provide more up-to-date set of genomes and enhance the applicability of the methodology to your soil shotgun data.

Please let us know if me need further information or assistance.

EorgeKit commented 4 months ago

Hi, @roseedwin , thanks for writing back. I have finished downloading the latest version of the GTDB and ill try to follow along in creating the custom kraken DB. I will let you know in case of anything.

Additionally I was wondering if you could provide some insight about my case: I am using data that is from two collaborators who are trying to study the soil profile of three different sites, thus having only three samples in total, Because of a slight mistake in communication, one collaborator did a targeted sequencing (16S) for the first two samples . The other collaborator did a shotgun tru nanoseq sequencing of the third samples. Thus I have two different datasets two analyse and compare the three sites.

I have analysed the two targeted datasets with qiime2. and I am planning to use your approach to analyse the shotgun dataset. Trouble is how do i combine the taxonomic results two inqure about the beta rarefaction of all the three samples, I beleive alpha is per sample so that is easy. Please advise

anw-sh commented 3 months ago

Hi @roseedwin This is indeed a good paper, comparing the various classifiers for soil microbiome datasets. By any chance, have you performed any comparison using the recent MetaPhlAn database i.e. vJun23? As per the documentation, they say, more MAGs from different environments, including soil, have been added.

@EorgeKit The better option is to perform a 16S amplicon sequencing on the 3rd sample. However, you should be aware of the sequencing biases, since you will be processing this sample at a different facility.

Or try using Metaxa2 on the 3rd sample, which I believe uses an SSU and LSU database, instead of a marker gene combination as is in the case of MetaPhlAn. Then probably try to merge the OTU tables and perform downstream analysis. TBH, I never tried this way of merging, this is just a thought. Maybe, format the output of Metaxa, import it into QIIME2 and merge it with the existing feature table generated for the other 2 samples.

Points to note:

Hope this helps.

roseedwin commented 3 months ago

Hi @EorgeKit,

I agree with @anw-sh on the approach to extract 16S rRNA from the shotgun metagenomic data using tools like Metaxa2. This strategy will allow you to standardize the taxonomic analysis across all samples by aligning the methodologies as closely as possible.

I am not saying this is full proof but might be the best approach given the circumstances. There are also previous studies that have shown that the extracted 16S rRNA results from shotgun was similar to the actual 16S rrNA results in soil studies, which might help with this rationale.

Here is a step-by-step approach you might consider: 1. Extract 16S rRNA Sequences: Use Metaxa2 to extract 16S rRNA sequences from your shotgun dataset. 2. Classification: Once you have the 16S sequences, classify them using the same database and methodology used for your first two samples (e.g., using QIIME2 with the SILVA database). This maintains consistency in how taxonomic data is processed and interpreted. 3. Merging Data: After classifying the 16S sequences from your third sample, you can merge this data with the OTU tables generated from the first two samples. Ensure that all data is normalized or processed in a manner that allows for accurate comparative analysis. 4. Downstream Analysis: With all three samples processed through a uniform pipeline, you can proceed with your beta diversity analysis. As @anw-sh noted, if you have only one sample per site, this might limit some of the statistical inferences you can make from beta diversity metrics.