Closed baozg closed 2 years ago
This should be fixed in 83e67a2bf73650f5793783f8ec0fff48720eaa7a. You can either apply the patch to the condo-installed scripts (the file /public10/home/sci0009/miniconda3/envs/snakemake/lib/verkko/Snakefiles/4-processONT.sm
) or you can download the latest from the GitHub repo and compile following the instructions on the README.
Thanks for promptly response. It work well in 4-processONT.sm
, but it throw error in the step 6-layoutContigs
.
Traceback (most recent call last):
File "/public10/home/sci0009/miniconda3/envs/snakemake/lib/verkko/scripts/get_layout_from_mbg.py", line 276, in <module>
matches = get_matches(path, node_poses, contig_nodeseqs, raw_node_lens, edge_overlaps, pathleftclip, pathrightclip, readleftclip,
File "/public10/home/sci0009/miniconda3/envs/snakemake/lib/verkko/scripts/get_layout_from_mbg.py", line 112, in get_matches
assert read_end_pos > read_start_pos
AssertionError
Ah that'd be the next bug fix: 24e20b6fde5647843868e6bc3be865efa1143c82. Again you can edit but I might suggest getting the latest from repo because there are a couple of other bug fixes since the beta there.
OK, I will replace the all the code of conda environment. Thanks!
After replace the scripts, I can run the verkko pipeline with the Arabidopsis thaliana public HiFi and ONT data. But the 7-consensus/unitig-popped.fasta
size is very strange, it has 65 contigs (12,296,409 only).
The data I use is the https://ngdc.cncb.ac.cn/gsa/browse/CRA004538 I filter out the ONT reads less than <50kb. All other parameters are default of verkko.
Interesting, I'll try launching the data locally to see what happens. What are the sizes of the gfa files you have (in 4-processONT and 5-untip)?
# 4-processONT
-rw-rw-r-- 1 sci0009 sci0009 8.1M Jan 24 23:51 combined-edges-uniques.gfa
-rw-rw-r-- 1 sci0009 sci0009 151M Jan 24 23:52 connected.gfa
-rw-rw-r-- 1 sci0009 sci0009 151M Jan 24 23:51 gapped-unitig-unrolled-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 151M Jan 24 23:52 normal-connected.gfa
-rw-rw-r-- 1 sci0009 sci0009 94M Jan 24 23:52 ont-resolved-graph.gfa
-rw-rw-r-- 1 sci0009 sci0009 94M Jan 24 23:52 unitig-unrolled-ont-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 94M Jan 24 23:52 unrolled-ont-resolved.gfa
# 5-untip
-rw-rw-r-- 1 sci0009 sci0009 11M Jan 24 23:52 combined-edges-1.gfa
-rw-rw-r-- 1 sci0009 sci0009 11M Jan 24 23:52 combined-edges-final.gfa
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 popped-unitig-normal-connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 unitig-normal-connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 unitig-popped-unitig-normal-connected-tip.gfa
Based on that the assembly should be about 130 mbp so the issue is in the consensus step. Perhaps snakemake had an issue with the code changes mid-run. Can you remove the 6-layoutContigs
and 7-consensus
folders and re-run. If it's still too small, can you share the 6-layoutContigs/uniting-popped.layout
file here?
Actually, when I replace the scripts, I did remove the those two folders before snakemake running.
I try it again. The 7-consensus/unitig-popped.fast
is still 12Mb.
Here is the 6-layoutContigs/uniting-popped.layout unitig-popped.layout.txt
OK, that file is the issue, it's too small and has too few contigs. I suspect the bug fix to address your crash in 6-layoutContigs has another issue causing the file truncation. I've downloaded the data so I'll test locally and see if I can reproduce it.
The MBG is much slow than human and consume the most time for this data (15 hours with k=1001 and w=100 ),maybe you can try other parameters.
I ran this assembly and got a final fasta >200mb in size. I didn't change any parameters and used the full dataset. So the question is why is your 6-layoutContigs
result truncated. Can you run ls -lha */6-layout*/
, ls -lha */7-consensus/
, and wc -l */6-layout*/*
. Post the 6-layoutContigs/createLayouts.err
file as well. Here are the results of the above on my assembly:
% ls -lha beta_asm/6-layoutContigs/
total 246M
drwxrwxr-x 2 korens Phillippy 4.0K Jan 25 23:44 .
drwxrwxr-x 2 korens Phillippy 4.0K Jan 26 06:09 ..
-rw-rw-r-- 1 korens Phillippy 162M Jan 25 23:41 combined-alignments.gaf
-rw-rw-r-- 1 korens Phillippy 12M Jan 25 23:41 combined-edges.gfa
-rw-rw-r-- 1 korens Phillippy 887K Jan 25 23:41 combined-nodemap.txt
-rw-rw-r-- 1 korens Phillippy 29K Jan 25 23:41 consensus_paths.txt
-rw-rw-r-- 1 korens Phillippy 131K Jan 25 23:44 createLayout.err
-rwxrwxr-x 1 korens Phillippy 3.4K Jan 25 23:41 createLayout.sh
-rw-rw-r-- 1 korens Phillippy 34 Jan 25 23:45 gaps.txt
-rw-rw-r-- 1 korens Phillippy 3.0M Jan 25 23:41 nodelens.txt
-rw-rw-r-- 1 korens Phillippy 68M Jan 25 23:45 unitig-popped.layout
% ls -lha beta_asm/7-consensus/
total 393M
drwxrwxr-x 2 korens Phillippy 4.0K Jan 26 06:09 .
drwxrwxr-x 2 korens Phillippy 4.0K Jan 26 06:09 ..
-rw-rw-r-- 1 korens Phillippy 465K Jan 26 00:22 buildPackages.err
-rwxrwxr-x 1 korens Phillippy 397 Jan 25 23:47 buildPackages.sh
-rw-rw-r-- 1 korens Phillippy 304 Jan 26 06:09 combineConsensus.err
-rwxrwxr-x 1 korens Phillippy 357 Jan 26 06:09 combineConsensus.sh
-rw-rw-r-- 1 korens Phillippy 1008 Jan 25 23:40 extractONT.err
-rwxrwxr-x 1 korens Phillippy 1.8K Jan 25 23:38 extractONT.sh
-rw-rw-r-- 1 korens Phillippy 1.8K Jan 25 23:41 ont_subset.extract
-rw-rw-r-- 1 korens Phillippy 105M Jan 25 23:41 ont_subset.fasta.gz
-rw-rw-r-- 1 korens Phillippy 160K Jan 25 23:41 ont_subset.id
drwxrwxr-x 2 korens Phillippy 4.0K Jan 26 00:27 packages
-rw-rw-r-- 1 korens Phillippy 0 Jan 26 00:22 packages.finished
-rw-rw-r-- 1 korens Phillippy 80M Jan 26 00:12 packages.readName_to_ID.map
-rw-rw-r-- 1 korens Phillippy 24K Jan 26 00:22 packages.tigName_to_ID.map
-rw-rw-r-- 1 korens Phillippy 209M Jan 26 06:09 unitig-popped.fasta
% wc -l beta_asm/6-layoutContigs/*
1660064 beta_asm/6-layoutContigs/combined-alignments.gaf
348248 beta_asm/6-layoutContigs/combined-edges.gfa
27146 beta_asm/6-layoutContigs/combined-nodemap.txt
1369 beta_asm/6-layoutContigs/consensus_paths.txt
1369 beta_asm/6-layoutContigs/createLayout.err
64 beta_asm/6-layoutContigs/createLayout.sh
1 beta_asm/6-layoutContigs/gaps.txt
211548 beta_asm/6-layoutContigs/nodelens.txt
1430506 beta_asm/6-layoutContigs/unitig-popped.layout
3680315 total
$ ls -lha 6-layoutContigs/*
-rw-rw-r-- 1 sci0009 sci0009 161M Jan 25 01:49 6-layoutContigs/combined-alignments.gaf
-rw-rw-r-- 1 sci0009 sci0009 11M Jan 25 01:49 6-layoutContigs/combined-edges.gfa
-rw-rw-r-- 1 sci0009 sci0009 836K Jan 25 01:49 6-layoutContigs/combined-nodemap.txt
-rw-rw-r-- 1 sci0009 sci0009 8.0K Jan 25 01:49 6-layoutContigs/consensus_paths.txt
-rw-rw-r-- 1 sci0009 sci0009 78K Jan 25 01:50 6-layoutContigs/createLayout.err
-rwxrwxr-x 1 sci0009 sci0009 3.4K Jan 25 01:49 6-layoutContigs/createLayout.sh
-rw-rw-r-- 1 sci0009 sci0009 365 Jan 25 01:50 6-layoutContigs/gaps.txt
-rw-rw-r-- 1 sci0009 sci0009 2.9M Jan 25 01:50 6-layoutContigs/nodelens.txt
-rw-rw-r-- 1 sci0009 sci0009 40K Jan 25 01:50 6-layoutContigs/unitig-popped.layout
$ ls -lha 7-consensus/
total 46M
drwxrwxr-x 3 sci0009 sci0009 4.0K Jan 25 01:57 .
drwxrwxr-x 11 sci0009 sci0009 4.0K Jan 25 01:49 ..
-rw-rw-r-- 1 sci0009 sci0009 289K Jan 25 01:55 buildPackages.err
-rwxrwxr-x 1 sci0009 sci0009 411 Jan 25 01:50 buildPackages.sh
-rw-rw-r-- 1 sci0009 sci0009 342 Jan 25 01:57 combineConsensus.err
-rwxrwxr-x 1 sci0009 sci0009 403 Jan 25 01:57 combineConsensus.sh
-rw-rw-r-- 1 sci0009 sci0009 432 Jan 25 01:50 extractONT.err
-rwxrwxr-x 1 sci0009 sci0009 1.3K Jan 25 01:49 extractONT.sh
-rw-rw-r-- 1 sci0009 sci0009 774 Jan 25 01:50 ont_subset.extract
-rw-rw-r-- 1 sci0009 sci0009 34M Jan 25 01:50 ont_subset.fasta.gz
-rw-rw-r-- 1 sci0009 sci0009 45K Jan 25 01:50 ont_subset.id
drwxrwxr-x 2 sci0009 sci0009 4.0K Jan 25 01:56 packages
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 25 01:55 packages.finished
-rw-rw-r-- 1 sci0009 sci0009 45K Jan 25 01:55 packages.readName_to_ID.map
-rw-rw-r-- 1 sci0009 sci0009 1.5K Jan 25 01:55 packages.tigName_to_ID.map
-rw-rw-r-- 1 sci0009 sci0009 12M Jan 25 01:57 unitig-popped.fasta
$ wc -l 6-layoutContigs/*
1656796 6-layoutContigs/combined-alignments.gaf
308801 6-layoutContigs/combined-edges.gfa
22556 6-layoutContigs/combined-nodemap.txt
400 6-layoutContigs/consensus_paths.txt
400 6-layoutContigs/createLayout.err
64 6-layoutContigs/createLayout.sh
9 6-layoutContigs/gaps.txt
197148 6-layoutContigs/nodelens.txt
1011 6-layoutContigs/unitig-popped.layout
2187185 total
By the way, what's the size and N50 of your Col-0 assembly ? It seems much large than the Col-0 reference.
The N50 is pretty meaningless on this assembly, it's the same as the reference. Total BP is 218 mb mostly because there are two tangles, the mitochondria which is very high coverage and probably has variation within the sample and a repeat at the end of Chromosome 4. Ignoring unitigs <1mb the assembly is 134 Mbp and all chromosomes are in a single uniting except Chromosome 5 which is split into two pieces.
The consensus_paths.txt are too small, these come from the 5-untip step. Can you post the file sizes and wc on the files in that directory along with untip.err? Did you ever run out of space during this run?
Wow, verkko
resolve the 45S rDNA cluster in the Chr2. Organelle always confusing the nuclear genome assembly in the plant genome. I did not run out of space during this run. Maybe I can restart the whole pipeline before the ONT read mapping.
$ ls -lha 5-untip/*
-rw-rw-r-- 1 sci0009 sci0009 11M Jan 24 23:52 5-untip/combined-edges-1.gfa
-rw-rw-r-- 1 sci0009 sci0009 11M Jan 24 23:52 5-untip/combined-edges-final.gfa
-rw-rw-r-- 1 sci0009 sci0009 1.5M Jan 24 23:52 5-untip/combined-nodemap-1.txt
-rw-rw-r-- 1 sci0009 sci0009 1.5M Jan 24 23:52 5-untip/combined-nodemap-final.txt
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 5-untip/connected-tip.gfa
-rwxrwxr-x 1 sci0009 sci0009 741 Jan 24 23:52 5-untip/getCoverage.unitig-popped-unitig-normal-connected-tip.sh
-rwxrwxr-x 1 sci0009 sci0009 721 Jan 24 23:52 5-untip/getCoverage.unitig-unrolled-hifi-resolved.sh
-rwxrwxr-x 1 sci0009 sci0009 709 Jan 24 23:52 5-untip/getCoverage.unitig-unrolled-ont-resolved.sh
-rw-rw-r-- 1 sci0009 sci0009 7.8K Jan 24 23:52 5-untip/nodecov_hifi_fix.csv
-rw-rw-r-- 1 sci0009 sci0009 2.9M Jan 24 23:52 5-untip/nodelens-1.txt
-rw-rw-r-- 1 sci0009 sci0009 2.9M Jan 24 23:52 5-untip/nodelens-final.txt
-rw-rw-r-- 1 sci0009 sci0009 1.8K Jan 24 23:52 5-untip/popinfo.err
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 5-untip/popped-unitig-normal-connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 24 23:52 5-untip/prepCoverage.err
-rwxrwxr-x 1 sci0009 sci0009 3.0K Jan 24 23:52 5-untip/prepCoverage.sh
-rw-rw-r-- 1 sci0009 sci0009 14K Jan 24 23:52 5-untip/unitig-mapping-3.txt
-rw-rw-r-- 1 sci0009 sci0009 11K Jan 24 23:52 5-untip/unitig-mapping-4.txt
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 5-untip/unitig-normal-connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 24 23:52 5-untip/unitig-popped-unitig-normal-connected-tip.getCoverage.err
-rw-rw-r-- 1 sci0009 sci0009 93M Jan 24 23:52 5-untip/unitig-popped-unitig-normal-connected-tip.gfa
-rw-rw-r-- 1 sci0009 sci0009 5.0K Jan 24 23:52 5-untip/unitig-popped-unitig-normal-connected-tip.hifi-coverage.csv
-rw-rw-r-- 1 sci0009 sci0009 7.4K Jan 24 23:52 5-untip/unitig-popped-unitig-normal-connected-tip.ont-coverage.csv
-rw-rw-r-- 1 sci0009 sci0009 125 Jan 24 23:52 5-untip/untip.err
-rwxrwxr-x 1 sci0009 sci0009 3.0K Jan 24 23:52 5-untip/untip.sh
$ wc -l 5-untip/*
308179 5-untip/combined-edges-1.gfa
308801 5-untip/combined-edges-final.gfa
42264 5-untip/combined-nodemap-1.txt
42664 5-untip/combined-nodemap-final.txt
1215 5-untip/connected-tip.gfa
5 5-untip/getCoverage.unitig-popped-unitig-normal-connected-tip.sh
5 5-untip/getCoverage.unitig-unrolled-hifi-resolved.sh
5 5-untip/getCoverage.unitig-unrolled-ont-resolved.sh
501 5-untip/nodecov_hifi_fix.csv
196241 5-untip/nodelens-1.txt
197148 5-untip/nodelens-final.txt
109 5-untip/popinfo.err
925 5-untip/popped-unitig-normal-connected-tip.gfa
0 5-untip/prepCoverage.err
46 5-untip/prepCoverage.sh
562 5-untip/unitig-mapping-3.txt
400 5-untip/unitig-mapping-4.txt
1200 5-untip/unitig-normal-connected-tip.gfa
0 5-untip/unitig-popped-unitig-normal-connected-tip.getCoverage.err
604 5-untip/unitig-popped-unitig-normal-connected-tip.gfa
339 5-untip/unitig-popped-unitig-normal-connected-tip.hifi-coverage.csv
397 5-untip/unitig-popped-unitig-normal-connected-tip.ont-coverage.csv
8 5-untip/untip.err
59 5-untip/untip.sh
1101677 total
## untip.err
UntipRelative
Unitigify 3
Combine mappings
Combine edges
Find lengths
Fix coverage
Pop bubbles based on coverage
Unitigify 4
It's possible the cluster on Chr2 is disconnected in the graph but the unitig mapping to Chr2 is longer than the reference. Chr5 should not have been disconnected either, that looks like a bug so thanks for sharing this data. I'll make a new issue to look into that. I attached the final graph.
Restarting part of the pipeline is a good idea. My guess is snakemake thought your failed 4-processONT
step was complete and everything downstream of it got corrupted. I'd suggest removing the folders from 4-*
onwards as well as the assembly.*
files in the top level and restart.
The graph looks awesome since the MBG gfa is very fragmented (153Mb, 100kb N50). Sorry, which reference do you use ? There are two Col-0 asssemly (Col-XJTU and Col-CEN), the data I download is from the Col-XJTU, although the two assembly should is same.
I restarted the pipeline from the ONT reads mapping.
I used TAIR10 as the reference. The MBG graph is expected to be fragmented since it's before HiFi or ONT read resolution.
I meet the same problem as the #40 when I restarted the pipeline from aligning ONT reads.
That seems very strange if it already ran this step before, nothing should have changed. Is the hifi-nodecov-gapped-unitig-unrolled-hifi-resolved.csv file also very small as in #40? Try running the script by hand:
/public10/home/sci0009/miniconda3/envs/snakemake/lib/verkko/scripts/get_original_coverage.py \
gapped-unitig-unrolled-hifi-resolved.gfa \
combined-nodemap-uniques.txt \
combined-edges-uniques.gfa \
nodelens-uniques.txt \
../1-buildGraph/hifi_nodecov.csv \
> hifi-nodecov-gapped-unitig-unrolled-hifi-resolved.csv
/public10/home/sci0009/miniconda3/envs/snakemake/lib/verkko/scripts/estimate_unique_local.py \
gapped-unitig-unrolled-hifi-resolved.gfa \
hifi-nodecov-gapped-unitig-unrolled-hifi-resolved.csv \
alns-ont-filter-trim.gaf 100000 30 0.8 \
> unique_nodes_ont_coverage.txt
Yes. The hifi-nodecov-gapped-unitig-unrolled-hifi-resolved.csv
only have the header when I run by hand.
I find the major difference of two runs is the combined-nodemap-uniques.txt
.
40216 combined-nodemap-uniques.txt (run1) 20108 ../../Arab_new/4-processONT/combined-nodemap-uniques.txt (run2)
The command difference is below. Which one is correct ?
# run1
cat ../2-processGraph/unitig-mapping-1.txt ../2-processGraph/unitig-mapping-hifigap.txt ../2-processGraph/unroll_mapping_1.txt \
> combined-nodemap-uniques.txt
# run2
# cat *mapping* > ...
cat ../2-processGraph/unitig-mapping-1.txt ../2-processGraph/unroll_mapping_1.txt \
> combined-nodemap-uniques.txt
Here is the size of step1 and step2.
$ ls -lha 1-buildGraph/*
-rw-rw-r-- 1 sci0009 sci0009 724K Jan 24 07:00 1-buildGraph/buildGraph.err
-rwxrwxr-x 1 sci0009 sci0009 1.5K Jan 23 16:09 1-buildGraph/buildGraph.sh
-rw-rw-r-- 1 sci0009 sci0009 292K Jan 24 07:00 1-buildGraph/hifi_nodecov.csv
-rw-rw-r-- 1 sci0009 sci0009 149M Jan 24 07:00 1-buildGraph/hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 161M Jan 24 07:00 1-buildGraph/paths.gaf
$ ls -lha 2-processGraph/*
-rw-rw-r-- 1 sci0009 sci0009 153M Jan 24 07:01 2-processGraph/gapped-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 149M Jan 24 07:00 2-processGraph/gapped-once-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 149M Jan 24 07:01 2-processGraph/gapped-twice-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 773 Jan 24 07:00 2-processGraph/gaps-hifi-1.gaf
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 24 07:01 2-processGraph/gaps-hifi-2.gaf
-rw-rw-r-- 1 sci0009 sci0009 308K Jan 24 07:01 2-processGraph/gaps-hifi-3.gaf
-rw-rw-r-- 1 sci0009 sci0009 161M Jan 24 07:00 2-processGraph/paths-nogap-1.gaf
-rw-rw-r-- 1 sci0009 sci0009 161M Jan 24 07:01 2-processGraph/paths-nogap-2.gaf
-rw-rw-r-- 1 sci0009 sci0009 160M Jan 24 07:01 2-processGraph/paths-nogap-3.gaf
-rw-rw-r-- 1 sci0009 sci0009 190 Jan 24 07:01 2-processGraph/process.err
-rwxrwxr-x 1 sci0009 sci0009 1.7K Jan 24 07:00 2-processGraph/processGraph.sh
-rw-rw-r-- 1 sci0009 sci0009 150M Jan 24 07:01 2-processGraph/unitig-gapped-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 686K Jan 24 07:01 2-processGraph/unitig-mapping-1.txt
-rw-rw-r-- 1 sci0009 sci0009 662K Jan 24 07:01 2-processGraph/unitig-mapping-hifigap.txt
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 24 23:52 2-processGraph/unitig-unrolled-hifi-resolved.getCoverage.err
-rw-rw-r-- 1 sci0009 sci0009 149M Jan 24 07:01 2-processGraph/unitig-unrolled-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 317K Jan 24 23:52 2-processGraph/unitig-unrolled-hifi-resolved.hifi-coverage.csv
-rw-rw-r-- 1 sci0009 sci0009 321K Jan 24 23:52 2-processGraph/unitig-unrolled-hifi-resolved.ont-coverage.csv
-rw-rw-r-- 1 sci0009 sci0009 150M Jan 24 07:01 2-processGraph/unrolled-hifi-resolved.gfa
-rw-rw-r-- 1 sci0009 sci0009 0 Jan 24 07:01 2-processGraph/unroll_mapping_1.txt
The second one is correct, this was one of the updates since the beta release but the unroll_mapping_1.txt file still shouldn't be empty. I think the issue is that Snakemake doesn't handle updating code in the middle of the run well so now it's expecting output from an updated 2-processGraph run but it thinks this step is already complete.
You may have to go all the way back to remove everything except the 0-correction and 1-buildGraph steps. I don't think there have been changes to those two parts of the code since the beta.
Yep. I remove the 2-processGraph
and run with snakemake. The pipeline output is different with the first run.
Was your crash in #40 also a restart without removing the 2-processGraph folder?
Yep. It was the same. When it crash, then I download the Arabidopsis thalian data. I will close it if this run is successful.
After removing the 2-proceesONT
folder, I can produce a normal assembly, but it seems more frgmented than yours. The 9 contigs (>1Mb) consists of 133Mb.
# second column is the size
utig4-438 32652233 34007608 32652233 32652234
utig4-448 22416905 129365070 22416905 22416906
utig4-408 17649930 97249093 17649930 17649931
utig4-406 17323780 11 17323780 17323781
utig4-407 13624567 17622829 13624567 13624568
utig4-400 12842428 78636984 12842428 12842429
utig4-424 10349929 118019802 10349929 10349930
utig4-418 4278631 91767898 4278631 4278632
utig4-401 2231285 115430493 2231285 2231286
utig4-465 299014 17323803 299014 299015
Do you show the graph of 5-untip/unitig-popped-unitig-normal-connected-tip.gfa
?
Good that you're getting an assembly now. There is some randomness in MBG still, the asm is similar to mine except yours h as a few unresolved bubbles which led to shorter contigs. Are you still using only ONT reads >50kb? That could explain the difference, we've seen improvements in human genome when we use the shorter ONT reads, mostly likely because of some sub-optimal coverage thresholds that verkko has.
Yes. I use the reads > 50kb. I try to use the all ONT reads, but the result is similiar with larger genome size (160Mb) than using reads >50kb (140 Mb). It seems MBG randomness is more contributed to this shorter contigs.
## > 1Mb contigs
utig4-867 32555598 34674956 32555598 32555599
utig4-895 26161544 142057026 26161544 26161545
utig4-865 17649930 116148922 17649930 17649931
utig4-863 17323326 11 17323326 17323327
utig4-898 15876846 17323349 15876846 15876847
utig4-866 12842225 69678118 12842225 12842226
utig4-881 3962735 82615346 3962735 3962736
utig4-715 3906210 137641692 3906210 3906211
utig4-873 2691859 133798864 2691859 2691860
We have an issue to track instability in MBG (#24) so I'm going to close this since you can run Verkko now and we'll use your dataset to test variation from run to run.
Hi,
I am using Arabidopsis thaliana data for testing with default parameter. I can run the Ecoli data successfully. The public data is 140x HiFi (22Gb, reads N50 15kb) and 115x ONT (26G, N50 82kb) . But
verkko
stop after aligning ONT data to the graph.Here is the log of building graph