Finished product has just over 1000 scaffolds

rvicedomini / strainberry

Automated strain separation of low-complexity metagenomes

MIT License

49 stars 4 forks source link

Finished product has just over 1000 scaffolds #7

Open ctparson opened 3 years ago

ctparson commented 3 years ago

I know my input metagenome only contained two strains of a given species, and had been pretty heavily filtered to only have the reads from that species in the metagenome assembly, however, when I perform the analysis with strainberry my resulting assembly.scaffolds.fa has just over 1000 scaffolds in it, are there any thoughts or suggestions on how to correct this.

cifuj commented 2 years ago

I have a similar problem. Using PacBio HiFi reads, I filtered the reads of the genome I'm working with and assembled them with Flye. After that, I mapped the reads back to the assembled genome, following the suggested pipeline, and ran Strainberry iteratively with up to 5 strains. I have >400 scaffolds in the assembly.scaffolds.fa file. Is there a way to know to which strain does each scaffold belongs just with the scaffold's name? or to know which contig were generated with which phased set of reads?

zonghao-LI commented 2 years ago

I have a similar problem,too. Using nanopore reads,I combined the reads of the three known species together and assembled them with Flye. After that, I mapped the reads back to the assembled genome, following the suggested pipeline, and ran Strainberry iteratively with preset value. I have >500scaffolds in the assembly.scaffolds.fa file. Is there a way to know to which strain does each scaffold belongs just with the scaffold's name? or to know which contig were generated with which phased set of reads?or where are the SNPs of the three strains in the gene?(ex:VCF format)