Closed JWJ13164328557 closed 2 months ago
The OFS is correct since GFAs are tab not comma separated, it's likely related to having no nodes over 100kb, see #127. What is your input data (type, coverage, genomes size)? What's the end of the buildGraph.err log in 1-buildGraph?
I used duplex, ONT, and porec data
Thank you very much!!!!!!!!
Why does the 129GB duplex data still report this error after I added compression
This looks like there is no graph here at all, no edges and no sequence. What is the exact command you're running with and what coverage do you expect the duplex data to be for the genome?
verkko -d /data/liux/ONT_data/tmp/GL/Genomic/verkko --hifi /data/liux/ONT_data/tmp/duplex/duplex_merge.fastq.gz --nano /data/liux/ONT_data/tmp/2024.7.4.fastq.gz --porec /data/liux/ONT_data/data/10-Pore-C/20240408-Pore-C-001_001-1.fastq.gz --threads 50 ,my genomic is 700M
So it should be 100+x? It seems this duplex data is not high quality. Have you tried counting k-mers and running genome scope on the duplex data to confirm genome size/quality? Are you able to share it with us?
The genome size is known,Can I use only ONT and porec data?
I have not attempted to calculate k-mers and run genome ranges on dual data to confirm genome size/mass
If you have recent ONT (R10) data, you can potentially run dorado correct (aka HERRO) and use the corrected reads as HiFi + the uncorrected reads as ONT inputs to verkko (see the README for details). Still would recommend sharing the duplex data if you're able with us and/or checking the genome scope output on the k-mers from duplex data to see what is going on with it.
I re extracted the duplex reads from the duplex data as hifi input,but it seems that the data volume is not enough
If you really do have 100x+ duplex data it should be enough, we've been using between 25-30x per haplotype so 50-60x total coverage for a diploid genome. It seems something is off with this data, either it's not high enough quality as expected for duplex data (which genome scope or the histogram would tell you) or they were not extracted correctly so are really simplex and not duplex data which should be visible from the instrument-reported QV values for the sequences. So you have a few options:
After using the UL data corrected by Herro, it became normal and I was able to obtain the assembly results smoothly. Thank you very much, teacher
how to save it?
fast i think OFS is error so I changed OFS but it will make mistakes in the next step