Closed jordur closed 7 years ago
Counting occurrences of kmers failed. Are there any errors in unitigging/0-mercounts/meryl.out (or similarly named file)? What does 'ls -l unitigging/0-mercounts/' show?
Does the read length histogram reported in the canu screen output look reasonable?
Hi @brianwalenz. Here it is:
$ ls -l unitigging/0-mercounts/
total 3104
-rw-r--r-- 1 jdurban ibv 1097 9 mai 18:10 Canu_assembly_Bjararaca.ms22.estMerThresh.err
-rw-r--r-- 1 jdurban ibv 0 9 mai 18:10 Canu_assembly_Bjararaca.ms22.estMerThresh.out.WORKING
-rw-r--r-- 1 jdurban ibv 455 9 mai 18:10 Canu_assembly_Bjararaca.ms22.histogram
-rw-r--r-- 1 jdurban ibv 97 9 mai 18:10 Canu_assembly_Bjararaca.ms22.histogram.info
-rw-r--r-- 1 jdurban ibv 2096936 9 mai 18:10 Canu_assembly_Bjararaca.ms22.mcdat
-rw-r--r-- 1 jdurban ibv 1048608 9 mai 18:10 Canu_assembly_Bjararaca.ms22.mcidx
-rw-r--r-- 1 jdurban ibv 2932 9 mai 18:10 meryl.124536_1.out
-rw-r--r-- 1 jdurban ibv 27 9 mai 18:09 meryl.jobSubmit.out
-rwxr-xr-x 1 jdurban ibv 186 9 mai 18:09 meryl.jobSubmit.sh
-rwxr-xr-x 1 jdurban ibv 1854 9 mai 18:09 meryl.sh
-rw-r--r-- 1 jdurban ibv 0 9 mai 18:11 meryl.success
And yes!! There is a segmentation fault:
$tail unitigging/0-mercounts/meryl.124536_1.out
/var/spool/slurmd/job124536/slurm_script: line 63: 10985 **Violació de segment ** $bin/estimate-mer-threshold -h ./Canu_assembly_Bjararaca.ms22.histogram -c 0.2384795 > ./Canu_assembly_Bjararaca.ms22.estMerThresh.out.WORKING 2> ./Canu_assembly_Bjararaca.ms22.estMerThresh.err
what means segmentation fault in catalan ;) So, it seems to be a matter of resources? Should I increase the usage memory? Thanks
Can you upload the *.histogram file? The mcdat and mcidx files would be better, but, WARNING, these contain all the kmers for your assembly (they're small enough to email to me though).
Hi @brianwalenz Attached you will find these files. Thank you for your time! files.zip
Thanks. I can't reproduce the crash. What version of canu do you have?
However, in the segmentation fault error, the command line is reported. It's got a coverage estimate of 0.24. Looking at the histogram, most of the kmers are single copy. I don't think you have enough coverage in input reads for any assembly. To get 20x coverage of 2Mbp with 274 reads - just barely enough for assembly - you'd need an average read length of 150k.
I am using Canu v1.4 (+126 commits). Ok, that's exactly what I suspected. I will set a lower coverage in order obtain some results. In fact the sequencing process was wrong since I introduced a bubble on the sensor array but some reads were produced and I though that some assembly could be performed. In fact I don't think I need a 20X coverage since it is a new genomic region from a non-model organism and a lower coverage could be also valuable. Thank you for your help!
You might as well update to the latest code. I can't point to any specific improvements that would help, but you are a couple months out of date. I don't think it'll help, there just isn't enough data to generate corrected reads, so only repeats are getting output.
Ok, thank you so much! Finally, how you got the estimate of 0.24 coverage? Was it from the histogram file?
It's the -c parameter to estimate-mer-coverage in:
$tail unitigging/0-mercounts/meryl.124536_1.out
/var/spool/slurmd/job124536/slurm_script: line 63: 10985 **Violació de segment ** $bin/estimate-mer-threshold -h ./Canu_assembly_Bjararaca.ms22.histogram -c 0.2384795 > ./Canu_assembly_Bjararaca.ms22.estMerThresh.out.WORKING 2> ./Canu_assembly_Bjararaca.ms22.estMerThresh.err
This is just bases in reads (after correction and trimming) divided by your genome size estimate (the genomeSize parameter). I don't think it's actually used by this program, but still passed in.
The start of the canu logging (to the screen) has a histogram of the read lengths. This also reports coverage.
Great! Thank you so much!
Dear, I was trying to assembly 274 reads from a nanopore minion run with canu using the following command line:
We are not sure about the sample size since it is from a non-model organism. However it comes from 8 BAC genomic clones, 300Kb length each so we estimated a 2Mb genome fragment size. Finally I got the following log file:
What does actually the message mean? I guess a low overlapping between reads but it would be great to test different options on the command line. Could you help me with this issue' Regards