spholmes / F1000_workflow

43 stars 33 forks source link

question about the abundant unidentified ASVs #30

Open xnus opened 5 years ago

xnus commented 5 years ago

Hi, all,

I am following the Workflow for Microbiome Data Analysis dealing with ITS1 NGS data of fungal communities. We adopted the notion of ASV in our analysis, rather than OTU. However, this brings a problem, that lots of ASVs still remain after taxon agglomeration at species level.

Considering that some ASVs with similar sequences which could belong to one same fungal species, and too many ASVs will cause troubles in the following analysis and interpretations, we wanted to reduce the number of ASVs.

What I’m doing now is to identify all ASVs based on UNITE reference and at bootstrap=75, and select the ASVs which failed being identified at species level. The selected ASVs are clustered into OTU at 97% using “kmer” package in R (truncate all sequences due to their minimum length when necessary, as kmer::otu only deal with sequences with same lengths). Then to identify OTUs again. Finally, the ASV-table where ASVs with species level taxonomical assignments, and the OTU-table with re-identified OTUs were combined for the following tax_glom.

This methods indeed dramatically reduces the numbers of “operational taxa” in the final dataset, and the ASVs with similar sequences were merge into one OTU. But I’m not sure whether this method would be acceptable by other researchers, since I haven’t seen analogous method in other’s publication. Is there anybody would like to share some comments on this method?

Thanks!

spholmes commented 5 years ago

This is not something we do, since 97% is an arbitrary threshold. We keep the filtered unannotated denoused ASVs just with numbers until the end of the analyse s and annotate the interesting ones.

On Wed, Apr 10, 2019, 05:46 Xiang Sun notifications@github.com wrote:

Hi, all,

I am following the Workflow for Microbiome Data Analysis https://bioconductor.org/help/course-materials/2017/BioC2017/Day1/Workshops/Microbiome/MicrobiomeWorkflowII.html dealing with ITS1 NGS data of fungal communities. We adopted the notion of ASV in our analysis, rather than OTU. However, this brings a problem, that lots of ASVs still remain after taxon agglomeration at species level.

Considering that some ASVs with similar sequences which could belong to one same fungal species, and too many ASVs will cause troubles in the following analysis and interpretations, we wanted to reduce the number of ASVs.

What I’m doing now is to identify all ASVs based on UNITE reference and at bootstrap=75, and select the ASVs which failed being identified at species level. The selected ASVs are clustered into OTU at 97% using “kmer” package in R (truncate all sequences due to their minimum length when necessary, as kmer::otu only deal with sequences with same lengths). Then to identify OTUs again. Finally, the ASV-table where ASVs with species level taxonomical assignments, and the OTU-table with re-identified OTUs were combined for the following tax_glom.

This methods indeed dramatically reduces the numbers of “operational taxa” in the final dataset, and the ASVs with similar sequences were merge into one OTU. But I’m not sure whether this method would be acceptable by other researchers, since I haven’t seen analogous method in other’s publication. Is there anybody would like to share some comments on this method?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcve5G5kxlFDoWuQxoK5ayiCGr8fCtks5vfbLhgaJpZM4cmkd8 .

xnus commented 5 years ago

@spholmes Thanks Susan. That is also how we are doing now. But it is sometimes uncomfortable seeing a bunch of ASVs of same genus, like Alternaria, Cladosporium. They should belong to a few species, but now you have to keep them all.