Closed johnsonj161 closed 2 years ago
The fact that your trees look different to mine isn't a cause for concern, and I can think of a few reasons why this is the case:
trycycler subsample
command after making these demo datasets, so your read subsampling would be different from mine.Regarding the plasmid, yes, they are often troublesome to resolve! I'd say that cluster_2 in your good dataset tree isn't too bad, though one contig is incomplete and two are double/tripled and will need some manual repair. The mediocre dataset doesn't have anything useable for the plasmid.
If you encounter a mediocre case like this in the real world, you would ideally resequence to get better reads. If that's not an option, you can try fiddling with the assembly parameters to see if you can get cleaner clusters.
Hope that helps! Ryan
I am hoping to get more insight into how you created your assemblies for the demo read sets. I am trying to replicate the contig clusters you generated for these datasets using assemblies generated from Flye, Miniasm+Minipolish, and Raven but am running into issues with the plasmid assembly in each case. My general process is shown below:
Step 1. Process reads with Filtlong filtlong --min_length 1000 --keep_percent 90 reads.fastq.gz | gzip > filtered_reads.fastq.gz
Step 2. Create read subsets trycycler subsample --reads filtered_reads.fastq.gz --out_dir subsets --count 12
Step 3. Create sub-assemblies using Flye, Miniasm+Minipolish, and Raven _# run Flye for read subsets sample_01.fastq to sample04.fastq flye --nano-raw subsets/sample_01.fastq --out-dir flye/sample_01
_# run Miniasm+Minipolish for read subsets sample_05 to sample08 minimap2 -x ava-ont subsets/sample_05.fastq subsets/sample_05.fastq > miniasm/sample_05/overlaps.paf miniasm -f subsets/sample_05.fastq miniasm/sample_05/overlaps.paf > miniasm/sample_05/assembly.gfa minipolish subsets/sample_05.fastq miniasm/sample_05/assembly.gfa
_# run Raven for read subsets sample_09 to sample12 raven subsets/sample_09.fastq > assemblies/sample_09.fasta
Step 4. Cluster contigs trycycler cluster --assemblies assemblies/*.fasta --reads filtered_reads.fastq --out_dir clusters
Below are the trees from the great, good, and mediocre demo datasets: Note that there are additional Raven assemblies included in the figures due to an error in my pipeline.
Great Dataset
Good Dataset
Mediocre Dataset
In each case, I am having issues resolving plasmids (though the great dataset is passable). This contrasts your examples for the same datasets. Do you have any ideas why my results might be different? For what it is worth, I did try running this pipeline without Filtlong and it did not solve the issue. Any help is appreciated!