rvicedomini / strainberry

Automated strain separation of low-complexity metagenomes
MIT License
47 stars 4 forks source link

New PacBio mock community dataset available #3

Open dportik opened 3 years ago

dportik commented 3 years ago

I am very interested in the development of strainberry, especially as it applies to metagenomics with PacBio HiFi data. I wanted to draw your attention to a new PacBio dataset I recently uploaded to NCBI SRA.

The sample is ZymoBIOMICS gut microbiome standard D6331, a community containing 21 species that mimic the human gut microbiome. This mock community contains 5 strains of E. coli, and may be particularly useful as a test dataset for strainberry. We sequenced this sample three times using different library prep methods. Additional information and the three HiFi fastq files are available on the NCBI Project page: http://www.ncbi.nlm.nih.gov/bioproject/680590

The SRA accessions are as follows: Standard Input: SRX9569057 Low Input: SRX9569058 Ultra-Low Input: SRX9569059

I've performed metagenomic assembly with HiCanu 2.1 and am unable to obtain strain assemblies. However, I can align the reads to the individual strains, identify matched reads, and assemble them to completion individually with HiCanu. Strainberry seems like a really useful alternative to such approaches (especially when references are unavailable).

rvicedomini commented 3 years ago

Thank you very much for drawing my attention on that! Really appreciated! :) It seems like an interesting dataset for the application of Strainberry. I will definitely consider it and particularly during the development of the improvements we have in mind for the tool.