marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
300 stars 29 forks source link

error of 4-processONT #273

Closed JWJ13164328557 closed 2 months ago

JWJ13164328557 commented 2 months ago

how to save it? image

fast i think OFS is error image so I changed OFS image but it will make mistakes in the next step

skoren commented 2 months ago

The OFS is correct since GFAs are tab not comma separated, it's likely related to having no nodes over 100kb, see #127. What is your input data (type, coverage, genomes size)? What's the end of the buildGraph.err log in 1-buildGraph?

JWJ13164328557 commented 2 months ago

I used duplex, ONT, and porec data

JWJ13164328557 commented 2 months ago

Thank you very much!!!!!!!!

JWJ13164328557 commented 2 months ago

image Why does the 129GB duplex data still report this error after I added compression

skoren commented 2 months ago

This looks like there is no graph here at all, no edges and no sequence. What is the exact command you're running with and what coverage do you expect the duplex data to be for the genome?

JWJ13164328557 commented 2 months ago

verkko -d /data/liux/ONT_data/tmp/GL/Genomic/verkko --hifi /data/liux/ONT_data/tmp/duplex/duplex_merge.fastq.gz --nano /data/liux/ONT_data/tmp/2024.7.4.fastq.gz --porec /data/liux/ONT_data/data/10-Pore-C/20240408-Pore-C-001_001-1.fastq.gz --threads 50 ,my genomic is 700M

skoren commented 2 months ago

So it should be 100+x? It seems this duplex data is not high quality. Have you tried counting k-mers and running genome scope on the duplex data to confirm genome size/quality? Are you able to share it with us?

JWJ13164328557 commented 2 months ago

The genome size is known,Can I use only ONT and porec data?

JWJ13164328557 commented 2 months ago

I have not attempted to calculate k-mers and run genome ranges on dual data to confirm genome size/mass

skoren commented 2 months ago

If you have recent ONT (R10) data, you can potentially run dorado correct (aka HERRO) and use the corrected reads as HiFi + the uncorrected reads as ONT inputs to verkko (see the README for details). Still would recommend sharing the duplex data if you're able with us and/or checking the genome scope output on the k-mers from duplex data to see what is going on with it.

JWJ13164328557 commented 2 months ago
image

I re extracted the duplex reads from the duplex data as hifi input,but it seems that the data volume is not enough

skoren commented 2 months ago

If you really do have 100x+ duplex data it should be enough, we've been using between 25-30x per haplotype so 50-60x total coverage for a diploid genome. It seems something is off with this data, either it's not high enough quality as expected for duplex data (which genome scope or the histogram would tell you) or they were not extracted correctly so are really simplex and not duplex data which should be visible from the instrument-reported QV values for the sequences. So you have a few options:

JWJ13164328557 commented 2 months ago

image After using the UL data corrected by Herro, it became normal and I was able to obtain the assembly results smoothly. Thank you very much, teacher