Closed katievigil closed 1 year ago
Yes, you ended up with no reads:
-- segments memory batches
-- -------- -------- -------
--
-- For 0 reads with 0 bases, limit to 0 batches.
-- Will count kmers using jobs, each using GB and 4 threads.
all due to trimming:
-- 1023 reads 87528 bases (reads with no overlaps, deleted)
-- 20 reads 25136 bases (reads with short trimmed length, deleted)
Based on the k-mer spectrum of the corrected reads, these reads don't seem to have shared k-mers. Have you confirmed they do indeed have any overlaps by mapping them to each other? You could also try assembling the corrected reads while skipping the trimming. It's also possible, if your target sequence is small enough, that a single corrected read would be sufficient and you could use that instead of running the assembly.
Hi, I have never mapped my reads against eachother I have only mapped contigs back to my reads, how do you recommend doing this? minimap2? How can I skip the trimming? Thanks for your response!
Do you have a reference you can map to? If yes map the reads to that and look if they tile across w/some overlaps. If no, the best option is to run something like minimap2 in overlapping mode and see if it is finding overlaps between the reads. To run without trimming see the quick start: https://canu.readthedocs.io/en/latest/quick-start.html#correct-trim-and-assemble-manually which shows how to run individual pipeline steps. Just provide the corrected reads as input to assembly.
Hi I do not have a reference, because these are metagenomic shot gun viral nanopore sequences. Looks like it failed again.
$ canu -p barcode03 -d /lustre/project/taw/kvigil/ONR/baratariabay/ONR_baratariabay100623/20231006_1648_MN18851_FAW76720_acec0fdf/fastq_pass/concatenate/canu/barcode03/ genomeSize=1m -untrimmed correctedErrorRate=0.12 maxInputCoverage=100 stopOnLowCoverage=0 'batOptions=-eg 0.10 -sb 0.01 -dg 2 -db 1 -dr 3' useGrid=false -nanopore /lustre/project/taw/kvigil/ONR/baratariabay/ONR_baratariabay100623/20231006_1648_MN18851_FAW76720_acec0fdf/fastq_pass/concatenate/canu/barcode03/barcode03.correctedReads.fasta.gz
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '10.0.2' (from '/lustre/project/taw/share/conda-envs/ONRviral/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 3 ' (from 'gnuplot') and image format 'png'.
--
-- Detected 20 CPUs and 64000 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /cm/shared/apps/slurm/14.03.0/bin/sinfo.
-- Slurm disabled by useGrid=false
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
-- (tag)Concurrency
-- (tag)Threads |
-- (tag)Memory | |
-- (tag) | | | total usage algorithm
-- ------- ---------- -------- -------- -------------------- -----------------------------
-- Local: meryl 12.000 GB 4 CPUs x 5 jobs 60.000 GB 20 CPUs (k-mer counting)
-- Local: hap 8.000 GB 4 CPUs x 5 jobs 40.000 GB 20 CPUs (read-to-haplotype assignment)
-- Local: cormhap 6.000 GB 10 CPUs x 2 jobs 12.000 GB 20 CPUs (overlap detection with mhap)
-- Local: obtovl 4.000 GB 5 CPUs x 4 jobs 16.000 GB 20 CPUs (overlap detection)
-- Local: utgovl 4.000 GB 5 CPUs x 4 jobs 16.000 GB 20 CPUs (overlap detection)
-- Local: cor -.--- GB 4 CPUs x - jobs -.--- GB - CPUs (read correction)
-- Local: ovb 4.000 GB 1 CPU x 20 jobs 80.000 GB 20 CPUs (overlap store bucketizer)
-- Local: ovs 8.000 GB 1 CPU x 20 jobs 160.000 GB 20 CPUs (overlap store sorting)
-- Local: red 16.000 GB 4 CPUs x 5 jobs 80.000 GB 20 CPUs (read error detection)
-- Local: oea 8.000 GB 1 CPU x 20 jobs 160.000 GB 20 CPUs (overlap error adjustment)
-- Local: bat 16.000 GB 4 CPUs x 1 job 16.000 GB 4 CPUs (contig construction with bogart)
-- Local: cns -.--- GB 4 CPUs x - jobs -.--- GB - CPUs (consensus)
--
-- Found Nanopore reads in 'barcode03.seqStore':
-- Libraries:
-- Nanopore: 1
-- Reads:
-- Raw: 1338011
-- Corrected: 112664
--
--
-- Generating assembly 'barcode03' in '/lustre/project/taw/kvigil/ONR/baratariabay/ONR_baratariabay100623/20231006_1648_MN18851_FAW76720_acec0fdf/fastq_pass/concatenate/canu/barcode03':
-- genomeSize:
-- 1000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.3200 ( 32.00%)
-- obtOvlErrorRate 0.1200 ( 12.00%)
-- utgOvlErrorRate 0.1200 ( 12.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1200 ( 12.00%)
-- utgErrorRate 0.1200 ( 12.00%)
-- cnsErrorRate 0.1200 ( 12.00%)
--
-- Stages to run:
-- trim corrected reads.
-- assemble corrected and trimmed reads.
--
--
-- Correction skipped; not enabled.
--
-- BEGIN TRIMMING
--
-- Creating overlap store trimming/barcode03.ovlStore using:
-- 1 bucket
-- 20 slices
-- using at most 1 GB memory each
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Oct 20 15:26:26 2023 with 147670.462 GB free disk space (1 processes; 20 concurrently)
cd trimming/barcode03.ovlStore.BUILDING
./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1
-- Finished on Fri Oct 20 15:26:28 2023 (2 seconds) with 147670.348 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, retry.
-- job trimming/barcode03.ovlStore.BUILDING/bucket0001 FAILED.
--
--
-- Running jobs. Second attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Fri Oct 20 15:26:28 2023 with 147670.348 GB free disk space (1 processes; 20 concurrently)
cd trimming/barcode03.ovlStore.BUILDING
./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1
-- Finished on Fri Oct 20 15:26:28 2023 (furiously fast) with 147670.348 GB free disk space
----------------------------------------
--
-- Overlap store bucketizer jobs failed, tried 2 times, giving up.
-- job trimming/barcode03.ovlStore.BUILDING/bucket0001 FAILED.
--
ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
Your command is wrong, that is the suggestion for nano pore-only assembly but it would still perform the trimming. You can try it but give it raw reads. Was referring to the example of running assembly:
canu \
-p ecoli -d ecoli-erate-0.039 \
genomeSize=4.8m \
correctedErrorRate=0.039 \
-trimmed -corrected -pacbio ecoli/ecoli.trimmedReads.fasta.gz
so in your case you would want to use the -trimmed -corrected -nanopore
options instead. You can keep using your genome size/etc. In all cases though, do not reuse the same -d folder for multiple experiments (like above). Use a new clean -d folder.
Idle
Hi , I am having an issue running this barcode, it does not have alot of reads, so it could be that no contigs will be assembled, but I just wanted to double check with you. I am doing metagenomic viral sequencing using nanopore. Thanks!