Closed Malabady closed 4 years ago
This is not so much a bug as a check what was removed in subsequent versions, it still exists in 1.8.
Your first try should be to increase the error rate allowed for consensus (edit the -e 0.05
to be -e 0.25
in consensus.sh and re-run the failing jobs by hand using sh consensus.sh <jobnum>
). Try this on an interactive session/node with the same memory as the jobs were requesting (8cores 100gb ram). If that still doesn't work, you could comment out the check in the code and recompile or update to 1.9 and re-run the assembly step from the trimmed reads (which will likely produce a better assembly).
Increasing the error allowed for consensus solved this issue.
On this page of canu-1.9 (https://github.com/marbl/canu/releases/tag/v1.9) it says explicitly that canu-1.9 is NOT compatible with assemblies started with earlier versions. Based on your suggestion, I assume it is okay to use canu-1.9 on canu-1.9-error-corrected reads. Correct?
Is "Canu snapshot v2.0-development +281 changes (r9774 126b9c814200893bba3e0a517d484454a16fe869)" the same as canu-1.9?
the can-1.9 that already compiled and I downloaded from the above-mentioned page, doesn't recognize gridEngineThreadsOption and gridEngineMemoryOption. Is this normal?
Thank you.
Yes, it is incompatible with the binary assembly intermediates. If you run canu -assemble with the trimmed reads that is a new assembly which will recompute overlaps/etc and so there is no incompatibility.
No the snapshot has changes post release and has not been validated as a release has with regression tests. Don't use the development version.
Yes, these options were replaced by a single option gridEngineResourceOption
instead to which you can provide both the threads and memory options you were using before (see the documentation here: https://canu.readthedocs.io/en/latest/parameter-reference.html#grid-engine-configuration).
Please take a look at the following segmentation fault. It is the canu-1.8 job.
-- All 315 consensus jobs finished successfully.
-- Finished stage 'consensusCheck', reset canuIteration.
-- Using slow alignment for consensus (iteration '0').
-- Configured 181 contig and 134 unitig consensus jobs.
-- Using slow alignment for consensus (iteration '0').
-- Configured 181 contig and 134 unitig consensus jobs.
----------------------------------------
-- Starting command on Tue Nov 19 15:21:56 2019 with 486928.045 GB free disk space
cd unitigging
/usr/local/apps/eb/canu/1.8-Linux-amd64/bin/tgStoreLoad \
-S ../run.seqStore \
-T ./run.ctgStore 2 \
-L ./5-consensus/ctgcns.files \
> ./5-consensus/ctgcns.files.ctgStoreLoad.err 2>&1
-- Finished on Tue Nov 19 15:28:20 2019 (384 seconds) with 486713.45 GB free disk space
----------------------------------------
----------------------------------------
-- Starting command on Tue Nov 19 15:28:20 2019 with 486713.45 GB free disk space
cd unitigging
/usr/local/apps/eb/canu/1.8-Linux-amd64/bin/tgStoreLoad \
-S ../run.seqStore \
-T ./run.utgStore 2 \
-L ./5-consensus/utgcns.files \
> ./5-consensus/utgcns.files.utgStoreLoad.err 2>&1
-- Finished on Tue Nov 19 15:29:44 2019 (84 seconds) with 486745.526 GB free disk space
----------------------------------------
-- Purging consensus output after loading to ctgStore and/or utgStore.
-- Purged 315 .cns outputs.
----------------------------------------
-- Starting command on Tue Nov 19 15:29:56 2019 with 487105.905 GB free disk space
cd unitigging
/usr/local/apps/eb/canu/1.8-Linux-amd64/bin/tgStoreDump \
-S ../run.seqStore \
-T ./run.ctgStore 2 \
-sizes -s 3600000000 \
> ./run.ctgStore/seqDB.v002.sizes.txt
Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
stores/tgTig.H::297 in _ZN5tgTig19mapGappedToUngappedEj()
stores/tgStoreDump.C::185 in _ZN8tgFilter14ignoreCoverageEP5tgTigb()
stores/tgStoreDump.C::141 in _ZN8tgFilter6ignoreEP5tgTigb()
stores/tgStoreDump.C::444 in _Z9dumpSizesP7sqStoreP7tgStoreR8tgFilterbm()
stores/tgStoreDump.C::1278 in main()
(null)::0 in (null)()
(null)::0 in (null)()
sh: line 4: 210649 Segmentation fault (core dumped) /usr/local/apps/eb/canu/1.8-Linux-amd64/bin/tgStoreDump -S ../run.seqStore -T ./run.ctgStore 2 -sizes -s 3600000000 > ./r
un.ctgStore/seqDB.v002.sizes.txt
-- Finished on Tue Nov 19 15:32:17 2019 (141 seconds) with 487038.998 GB free disk space
----------------------------------------
ERROR:
ERROR: Failed with exit code 139. (rc=35584)
ERROR:
ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: failed to generate unitig sizes.
ABORT:
I tried to run the failed command interactively with more memory (150Gb), but it failed with the same error. Does this have anything to do with the consensus error increase that we made to rescue those three jobs?
No I think what's happened is your store has become corrupted during your restart attempts, I'd guess some jobs ran at the same time (that is one or more tigs don't have a valid consensus). No good way to recover from that other that backing up to an earlier assembly step. You could try getting just the assembly from the store to see if that works (run the command above but replace -sizes -s 3600000000
with -contigs -fasta
. If this doesn't crash you have your assembly fasta.
If it does crash (and you don't have a backup of your ctgStore folder) you would need to remove the run.ctgStore, run.utgStore, 4-unitigger, 5-consensus folders and re-launch Canu specifying cnsErrorRate=0.25 to avoid the initial error.
isn't changing the cnsErrorRate to 0.25 from 0.05 a big jump? since I have only three jobs that failed for this reason, what do you think about using cnsErrorRate=0.10? Or, in other words, does it make any significant difference? I assume a high cnsErrorRate can lead to misassemblies.
Setting the high rate won't matter much, if there is a lower overlap identity available, it will be used first. This is only an upper limit and won't affect mis-assemblies as the contigs are already constructed, only their consensus is being computed.
Thank you so much, Sergay. You mentioned earlier that canu-1.9 will produce a better assembly. Since my dataset is quite large, it will several weeks to go through the trimming and assembly again. Do you think the improvement in the assembly is worth it? I will do anyhow, but I want to get an idea of what kind of improvement to expect.
You don't need to re-run trimming, just assembly. Can't say exactly how much of an improvement as it depends on your reads and genome but we've seen better repeat resolution and higher quality consensus.
I have run into another issue that I haven't seen before. So, after removing the run.ctgStore, run.utgStore, 4-unitigger, 5-consensus folders and relaunching canu, I got the folllowing problem. I restarted the job multipe times, but it didn't work. I notice that there is no ctgcns or utgcns folders inside the 5-consensus folder.
09:25:12 $ tail -n 30 rosea4/canu.out
--
-- Generating assembly 'run' in '/scratch/malabady/PitcherGenome/PitchPacBio/canu_assembly/rosea4'
--
-- Parameters:
--
-- genomeSize 3600000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.0500 ( 5.00%)
-- utgOvlErrorRate 0.0500 ( 5.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.0500 ( 5.00%)
-- utgErrorRate 0.0500 ( 5.00%)
-- cnsErrorRate 0.0500 ( 5.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Graph alignment jobs failed, tried 2 times, giving up.
--
ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
That's good because it implies consensus in your re-run completed. You technically have the assembly but we can also see why the graph alignment failed. Can you post the output of unitigging/4-unitigger/alignGFA*
?
I found the problem. see the following two failed commands:
/var/spool/torque/mom_priv/jobs/1767894.sapelo2.SC: line 64: 56473 Killed $bin/alignGFA -T ../run.ctgStore 2 -i ./run.contigs.gfa -o ./run.contigs.aligned.gfa -t
32 > ./run.contigs.aligned.gfa.err 2>&1
/var/spool/torque/mom_priv/jobs/1767894.sapelo2.SC: line 76: 58057 Killed $bin/alignGFA -bed -T ../run.utgStore 2 -C ../run.ctgStore 2 -i ./run.unitigs.bed -o ./
run.unitigs.aligned.bed -t 32 > ./run.unitigs.aligned.bed.err 2>&1
I ran them interactively and they were completed successfully and the rest of the assembly. see the following stats: sum = 6314580126, n = 27578, ave = 228971.65, largest = 34166068 N50 = 1512943, n = 699 N60 = 390511, n = 1624 N70 = 199849, n = 3991 N80 = 124717, n = 8058 N90 = 86780, n = 14177 N100 = 1012, n = 27578 N_count = 0 Gaps = 0
OK so you've finished the assembly. The stats look OK but it is much larger than your specified genome size of 3.6, how heterozygous is this genome? The histograms in the report file that are output along the way can be given to genomescope to estimate this. You'll probably need to run purge_dups to remove haplotypes from the assembly.
Based on my genomescope analysis, heterozygosity is less than 1% of this genome. Given that I used corrected reads to do the genomescope analysis, I am kinda skeptical about that low heterozygosity. I have used pruge_dups with an earlier canu assembly of only 50X of this data. Here are the results:
stats for Canu.contigs.fasta sum = 6266922019, n = 23648, ave = 265008.54, largest = 27667202 N50 = 1490962, n = 725 N60 = 437538, n = 1571 N70 = 214621, n = 3720 N80 = 130690, n = 7519 N90 = 87977, n = 13427 N100 = 7781, n = 23648 N_count = 0 Gaps = 0
stats for purged.fa sum = 3132131836, n = 7674, ave = 408148.53, largest = 27667202 N50 = 2900367, n = 275 N60 = 1972553, n = 404 N70 = 918986, n = 626 N80 = 247504, n = 1364 N90 = 115823, n = 3302 N100 = 7781, n = 7674 N_count = 11753 Gaps = 511 stats for hap.fa sum = 3134803293, n = 16426, ave = 190843.98, largest = 27275432 N50 = 315409, n = 1162 N60 = 198902, n = 2439 N70 = 136815, n = 4353 N80 = 100925, n = 7051 N90 = 74213, n = 10648 N100 = 7829, n = 16426 N_count = 1357 Gaps = 59
What do you think?
Corrected reads are normally reliable for estimating heterozygosity but I wouldn't have expected 1% to be separating so much of the genome. You could always compare to an illumina dataset if you have one to get another estimate of zygosity (or align the hap.fa to the purged.fa).
The purge_dups results look ok but I don't think it should be adding gaps, I'm not sure how you ended up with Ns after it. Maybe worth asking on the purge_dups repo about it to clarify.
I'll leave it up to you if you want to run 1.9, your assembly already seems pretty continuous so may not be worth the computational time. I'm going to close the issue since the canu errors you encountered have been resolved and you were able to finish the assembly.
Hi,
All consensus jobs were completed successfully except 3 jobs. Rerunning the Canu script didn't solve the problem. I tried to run those 3 jobs interactively from within the 5-consensus folder using the command: sh ./consensus.sh. The job runs for awhile before failing with the following message:
I tried with two out of the three failed jobs and got similar error. I rerun them with 8 cores and 100gb memory.
Reading through the issues, this seems to have been a bug in earlier releases. But I am using Canu-1.8. Additionally, In a different assembly with smaller part of this data, I didn't run into this issue and the assembly finished successfully.
Any suggestion how to resolve this issue?
Thanks