marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

sbatch: error: Batch job submission failed: Requested node configuration is not available #1398

Closed biowackysci closed 5 years ago

biowackysci commented 5 years ago

Hello, I am trying to use my CANU corrected reads to do a trimming and assembly the command I used was canu -trim-assemble -p canu_trim_assemble -d /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble/ genomeSize=2.8g useGrid=true stopOnReadQuality=false minOverlapLength=1000 -pacbio-corrected /group/pasture/Saila/CANU1.8/SAILA_CANU_Corrrection/canu_correction.correctedReads.fasta.gz gridEngine=slurm merylMemory=62 batMemory=62 gridOptions="--account=dbiopast1 --partition=batch --time=1000:00:00 --mem-per-cpu=100g"

-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
--
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_172' (from '/usr/local/EasyBuild/software/Java/1.8.0_172/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 48 CPUs and 754 gigabytes of memory.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
--
-- Found   7 hosts with  24 cores and  755 GB memory under Slurm control.
-- Found   3 hosts with  24 cores and  377 GB memory under Slurm control.
-- Found   5 hosts with  24 cores and  503 GB memory under Slurm control.
-- Found  95 hosts with  48 cores and  754 GB memory under Slurm control.
-- Found  10 hosts with  48 cores and 1510 GB memory under Slurm control.
-- Found   5 hosts with  48 cores and 1003 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     62 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16 GB   24 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   48 GB   12 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  utgovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16 GB    8 CPUs  (read error detection)
-- Grid:  oea        8 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       62 GB   32 CPUs  (contig construction with bogart)
-- Grid:  gfa       32 GB   32 CPUs  (GFA alignment and processing)
--
-- In 'canu_trim_assemble.seqStore', found PacBio reads:
--   Raw:        0
--   Corrected:  13803068
--   Trimmed:    9739173
--
-- Generating assembly 'canu_trim_assemble' in '/group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble'
--
-- Parameters:
--
--  genomeSize        2800000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
----------------------------------------
-- Starting command on Mon Jun 24 09:56:47 2019 with 35532.339 GB free disk space

    cd /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble
    sbatch \
      --mem-per-cpu=4g \
      --cpus-per-task=1 \
      --account=dbiopast1 \
      --partition=batch \
      --time=1000:00:00 \
      --mem-per-cpu=100g  \
      -D `pwd` \
      -J 'canu_canu_trim_assemble' \
      -o canu-scripts/canu.14.out  canu-scripts/canu.14.sh
Submitted batch job 551308

-- Finished on Mon Jun 24 09:56:47 2019 (fast as lightning) with 35532.339 GB free disk space

The error from the report is

 BEGIN ASSEMBLY
--
--
-- Running jobs.  First attempt out of 2.
--

CRASH:
CRASH: Canu 1.8
CRASH: Please panic, this is abnormal.
ABORT:
CRASH:   Failed to submit batch jobs.
CRASH:
CRASH: Failed at /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/../lib/site_perl/canu/Execution.pm line 1233.
CRASH:  canu::Execution::submitOrRunParallelJob('canu_trim_assemble', 'bat', 'unitigging/4-unitigger', 'unitigger', 1) called at /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/../lib/site_perl/canu/Unitig.pm line 361
CRASH:  canu::Unitig::unitigCheck('canu_trim_assemble') called at /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/canu line 863
CRASH: 
CRASH: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.jobSubmit-01.out):
CRASH:
CRASH: sbatch: error: Batch job submission failed: Requested node configuration is not available
CRASH:

Can you please help formulating the script for submission as I think its some space allocation issue and I am running this job on a grid

Thanks Saila

biowackysci commented 5 years ago

Hello Again, i amended the script again and it picked up where it left and now I have a bogart failure issue The script i used this time was canu -trim-assemble -p canu_trim_assemble -d /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble/ genomeSize=2.8g useGrid=true stopOnReadQuality=false minOverlapLength=1000 -pacbio-corrected /group/pasture/Saila/CANU1.8/SAILA_CANU_Corrrection/canu_correction.correctedReads.fasta.gz gridEngine=slurm merylMemory=62 batMemory=62 gridOptions="--account=dbiopast1 --partition=batch --time=1000:00:00 "

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_172' (from '/usr/local/EasyBuild/software/Java/1.8.0_172/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 48 CPUs and 754 gigabytes of memory.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
-- 
-- Found   7 hosts with  24 cores and  755 GB memory under Slurm control.
-- Found   3 hosts with  24 cores and  377 GB memory under Slurm control.
-- Found   5 hosts with  24 cores and  503 GB memory under Slurm control.
-- Found  95 hosts with  48 cores and  754 GB memory under Slurm control.
-- Found  10 hosts with  48 cores and 1510 GB memory under Slurm control.
-- Found   5 hosts with  48 cores and 1003 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     62 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16 GB   24 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   48 GB   12 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  utgovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16 GB    8 CPUs  (read error detection)
-- Grid:  oea        8 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       62 GB   32 CPUs  (contig construction with bogart)
-- Grid:  gfa       32 GB   32 CPUs  (GFA alignment and processing)
--
-- In 'canu_trim_assemble.seqStore', found PacBio reads:
--   Raw:        0
--   Corrected:  13803068
--   Trimmed:    9739173
--
-- Generating assembly 'canu_trim_assemble' in '/group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble'
--
-- Parameters:
--
--  genomeSize        2800000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%) 

The error was BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  35530.132 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   OverlapCache()--     2165260102 (032.40%)     2079080776 (031.11%)
ABORT:   OverlapCache()--     2185721572 (032.70%)     2098240857 (031.39%)
ABORT:   OverlapCache()--     2205905798 (033.01%)     2117137881 (031.68%)
ABORT:   OverlapCache()--     2225881983 (033.30%)     2135831669 (031.96%)
ABORT:   OverlapCache()--     2244853532 (033.59%)     2153572335 (032.22%)
ABORT:   OverlapCache()--     2265904987 (033.90%)     2173304614 (032.52%)
ABORT:   OverlapCache()--     2324469802 (034.78%)     2230685776 (033.38%)
ABORT:   OverlapCache()--     2396026387 (035.85%)     2301157672 (034.43%)
ABORT:   OverlapCache()--     2475021202 (037.03%)     2379077910 (035.60%)
ABORT:   OverlapCache()--     2572173381 (038.49%)     2475161910 (037.03%)
ABORT:   OverlapCache()--     2684698758 (040.17%)     2586454284 (038.70%)
ABORT:   OverlapCache()--     2737328845 (040.96%)     2637827172 (039.47%)
ABORT:   OverlapCache()--     2779608333 (041.59%)     2678756403 (040.08%)
ABORT:   OverlapCache()--     2821561958 (042.22%)     2719396548 (040.69%)
ABORT:   OverlapCache()--     2865081240 (042.87%)     2761587670 (041.32%)
ABORT:   OverlapCache()--     2908366195 (043.52%)     2803548285 (041.95%)
ABORT:   OverlapCache()--     2953501448 (044.19%)     2847414555 (042.60%)
ABORT:   OverlapCache()--     2996871410 (044.84%)     2889519848 (043.23%)
ABORT:   OverlapCache()--     3032050748 (045.37%)     2923318309 (043.74%)
ABORT:   OverlapCache()--     3057499017 (045.75%)     2947318439 (044.10%)
ABORT:   OverlapCache()--     3082718850 (046.12%)     2971092730 (044.45%)
ABORT:   OverlapCache()--     3105061836 (046.46%)     2992089457 (044.77%)
ABORT:   OverlapCache()--     3127141820 (046.79%)     3012854830 (045.08%)
ABORT:   OverlapCache()--     3149250847 (047.12%)     3033604855 (045.39%)
ABORT:   OverlapCache()--     3170798144 (047.44%)     3053839195 (045.69%)
ABORT:   OverlapCache()--     3191931768 (047.76%)     3073690448 (045.99%)
ABORT:   OverlapCache()--     3213022778 (048.07%)     3093493128 (046.29%)
ABORT:   OverlapCache()--     3234506507 (048.40%)     3113664811 (046.59%)
ABORT:   OverlapCache()--     3254569577 (048.70%)     3132433487 (046.87%)
ABORT:   OverlapCache()--     3273906776 (048.99%)     3150504805 (047.14%)
ABORT:   OverlapCache()--     3291594605 (049.25%)     3166981300 (047.39%)
ABORT:   OverlapCache()--     3307747328 (049.49%)     3181987648 (047.61%)
ABORT:   OverlapCache()--     3328013181 (049.79%)     3200960433 (047.89%)
ABORT:   OverlapCache()--     3372713708 (050.46%)     3244327362 (048.54%)
ABORT:   OverlapCache()--     3420491344 (051.18%)     3290808203 (049.24%)
ABORT:   OverlapCache()--     3469828347 (051.92%)     3338850997 (049.96%)
ABORT:   OverlapCache()--     3519564661 (052.66%)     3387337946 (050.68%)
ABORT:   OverlapCache()--     3570452964 (053.42%)     3436962086 (051.42%)
ABORT:   OverlapCache()--     3621774081 (054.19%)     3487045455 (052.17%)
ABORT:   OverlapCache()--     3669014351 (054.90%)     3532967958 (052.86%)
ABORT:   OverlapCache()--     3715838209 (055.60%)     3578503947 (053.54%)
ABORT:   OverlapCache()--     3761002076 (056.27%)     3622393446 (054.20%)
ABORT:   OverlapCache()--     3807159271 (056.96%)     3667280880 (054.87%)
ABORT:   OverlapCache()--     3854061317 (057.67%)     3712941365 (055.55%)
ABORT:   OverlapCache()--     3902129680 (058.38%)     3759774907 (056.25%)
ABORT:   OverlapCache()--     3952548467 (059.14%)     3808979692 (056.99%)
ABORT:   OverlapCache()--     4002647862 (059.89%)     3857833230 (057.72%)
ABORT:   OverlapCache()--     4047710415 (060.56%)     3901632612 (058.38%)
ABORT:   OverlapCache()--     4128242465 (061.77%)     3979613064 (059.54%)
ABORT:   OverlapCache()--     4218577318 (063.12%)     4067068031 (060.85%)
ABORT:

So now I am not sure what to do . Can you please help me ? Thanks Saila

biowackysci commented 5 years ago

And sorry the output log file of unitigging/4-unitigger/unitigger.err is here:

==> PARAMETERS.

Resources:
  Memory                512 GB
  Compute Threads       32 (command line)

Lengths:
  Minimum read          0 bases
  Minimum overlap       1000 bases

Overlap Error Rates:
  Graph                 0.045 (4.500%)
  Max                   0.045 (4.500%)

Deviations:
  Graph                 6.000
  Bubble                6.000
  Repeat                3.000

Edge Confusion:
  Absolute              2100
  Percent               200.0000

Unitig Construction:
  Minimum intersection  500 bases
  Maxiumum placements   2 positions

Debugging Enabled:
  (none)

==> LOADING AND FILTERING OVERLAPS.

ReadInfo()-- Using 13803068 reads, no minimum read length used.

OverlapCache()-- limited to 524288MB memory (user supplied).

OverlapCache()--     105MB for read data.
OverlapCache()--     526MB for best edges.
OverlapCache()--    1369MB for tigs.
OverlapCache()--     368MB for tigs - read layouts.
OverlapCache()--     526MB for tigs - error profiles.
OverlapCache()--  131072MB for tigs - error profile overlaps.
OverlapCache()--       0MB for other processes.
OverlapCache()-- ---------
OverlapCache()--  134231MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()--     263MB for overlap store structure.
OverlapCache()--  389793MB for overlap data.
OverlapCache()-- ---------
OverlapCache()--  524288MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 24 overlaps/read, based on 12.05x coverage.
OverlapCache()-- Initial guess at 1850 overlaps/read.
OverlapCache()--
OverlapCache()-- Adjusting for sparse overlaps.
OverlapCache()--
OverlapCache()--               reads loading olaps          olaps               memory
OverlapCache()--   olaps/read       all      some          loaded                 free
OverlapCache()--   ----------   -------   -------     ----------- -------     --------
OverlapCache()--         1850   12711747   1091321      3774202435  56.47%     332203 MB
OverlapCache()--        21799   13796625      6443      2320537679  98.98%     288848 MB
OverlapCache()--      2959871   13803068         0      2388505170 100.00%     287811 MB
OverlapCache()--
OverlapCache()-- Loading overlaps.
OverlapCache()--
OverlapCache()--          read from store           saved in cache
OverlapCache()--   ------------ ---------   ------------ ---------
OverlapCache()--       57214176 (000.86%)       55903912 (000.84%)
OverlapCache()--      114180478 (001.71%)      111552324 (001.67%)
OverlapCache()--      172347459 (002.58%)      168403366 (002.52%)
OverlapCache()--      230549419 (003.45%)      225291191 (003.37%)
OverlapCache()--      289577564 (004.33%)      282963374 (004.23%)
OverlapCache()--      348458321 (005.21%)      340517621 (005.09%)
OverlapCache()--      406835332 (006.09%)      397537667 (005.95%)
OverlapCache()--      466669944 (006.98%)      456013491 (006.82%)
OverlapCache()--      525707627 (007.87%)      513631226 (007.69%)
OverlapCache()--      584447454 (008.74%)      570970844 (008.54%)
OverlapCache()--      643279007 (009.62%)      628457316 (009.40%)
OverlapCache()--      702078346 (010.50%)      685897159 (010.26%)
OverlapCache()--      761017729 (011.39%)      743474891 (011.12%)
OverlapCache()--      819455498 (012.26%)      800619110 (011.98%)
OverlapCache()--      878511489 (013.14%)      858350772 (012.84%)
OverlapCache()--      937684615 (014.03%)      916251685 (013.71%)
OverlapCache()--      996863322 (014.92%)      974174420 (014.58%)
OverlapCache()--     1055844724 (015.80%)     1031868800 (015.44%)
OverlapCache()--     1113200621 (016.66%)     1087859185 (016.28%)
OverlapCache()--     1157326305 (017.32%)     1130246808 (016.91%)
OverlapCache()--     1200659146 (017.96%)     1171741911 (017.53%)
OverlapCache()--     1250942256 (018.72%)     1219991698 (018.25%)
OverlapCache()--     1300685886 (019.46%)     1267724360 (018.97%)
OverlapCache()--     1349629095 (020.19%)     1314668930 (019.67%)
OverlapCache()--     1384557262 (020.72%)     1347991441 (020.17%)
OverlapCache()--     1412951901 (021.14%)     1374841852 (020.57%)
OverlapCache()--     1433887142 (021.45%)     1394451667 (020.86%)
OverlapCache()--     1454862486 (021.77%)     1414095635 (021.16%)
OverlapCache()--     1475764832 (022.08%)     1433691123 (021.45%)
OverlapCache()--     1496208081 (022.39%)     1452813468 (021.74%)
OverlapCache()--     1516781727 (022.69%)     1472095433 (022.03%)
OverlapCache()--     1536708118 (022.99%)     1490745096 (022.30%)
OverlapCache()--     1556106990 (023.28%)     1508879960 (022.58%)
OverlapCache()--     1576395435 (023.59%)     1527873678 (022.86%)
OverlapCache()--     1595720693 (023.88%)     1545955234 (023.13%)
OverlapCache()--     1617196446 (024.20%)     1566093597 (023.43%)
OverlapCache()--     1638473956 (024.52%)     1586027100 (023.73%)
OverlapCache()--     1660280564 (024.84%)     1606481060 (024.04%)
OverlapCache()--     1681391821 (025.16%)     1626246456 (024.33%)
OverlapCache()--     1702479641 (025.47%)     1646005900 (024.63%)
OverlapCache()--     1723064248 (025.78%)     1665291928 (024.92%)
OverlapCache()--     1741818694 (026.06%)     1682800312 (025.18%)
OverlapCache()--     1763170287 (026.38%)     1702803911 (025.48%)
OverlapCache()--     1784311328 (026.70%)     1722637835 (025.77%)
OverlapCache()--     1805029364 (027.01%)     1742051450 (026.07%)
OverlapCache()--     1824939469 (027.31%)     1760679494 (026.34%)
OverlapCache()--     1846064783 (027.62%)     1780480170 (026.64%)
OverlapCache()--     1864943381 (027.90%)     1798123456 (026.90%)
OverlapCache()--     1884278597 (028.19%)     1816222304 (027.17%)
OverlapCache()--     1906416792 (028.52%)     1837005232 (027.49%)
OverlapCache()--     1928113347 (028.85%)     1857351499 (027.79%)
OverlapCache()--     1948705114 (029.16%)     1876624752 (028.08%)
OverlapCache()--     1967393605 (029.44%)     1894069917 (028.34%)
OverlapCache()--     1986662411 (029.73%)     1912081099 (028.61%)
OverlapCache()--     2006338642 (030.02%)     1930480472 (028.88%)
OverlapCache()--     2026010262 (030.31%)     1948862711 (029.16%)
OverlapCache()--     2045564967 (030.61%)     1967135869 (029.43%)
OverlapCache()--     2064714870 (030.89%)     1985022441 (029.70%)
OverlapCache()--     2083948928 (031.18%)     2002999731 (029.97%)
OverlapCache()--     2102541872 (031.46%)     2020345708 (030.23%)
OverlapCache()--     2122949039 (031.76%)     2039441029 (030.51%)
OverlapCache()--     2144475754 (032.09%)     2059607013 (030.82%)
OverlapCache()--     2165260102 (032.40%)     2079080776 (031.11%)
OverlapCache()--     2185721572 (032.70%)     2098240857 (031.39%)
OverlapCache()--     2205905798 (033.01%)     2117137881 (031.68%)
OverlapCache()--     2225881983 (033.30%)     2135831669 (031.96%)
OverlapCache()--     2244853532 (033.59%)     2153572335 (032.22%)
OverlapCache()--     2265904987 (033.90%)     2173304614 (032.52%)
OverlapCache()--     2324469802 (034.78%)     2230685776 (033.38%)
OverlapCache()--     2396026387 (035.85%)     2301157672 (034.43%)
OverlapCache()--     2475021202 (037.03%)     2379077910 (035.60%)
OverlapCache()--     2572173381 (038.49%)     2475161910 (037.03%)
OverlapCache()--     2684698758 (040.17%)     2586454284 (038.70%)
OverlapCache()--     2737328845 (040.96%)     2637827172 (039.47%)
OverlapCache()--     2779608333 (041.59%)     2678756403 (040.08%)
OverlapCache()--     2821561958 (042.22%)     2719396548 (040.69%)
OverlapCache()--     2865081240 (042.87%)     2761587670 (041.32%)
OverlapCache()--     2908366195 (043.52%)     2803548285 (041.95%)
OverlapCache()--     2953501448 (044.19%)     2847414555 (042.60%)
OverlapCache()--     2996871410 (044.84%)     2889519848 (043.23%)
OverlapCache()--     3032050748 (045.37%)     2923318309 (043.74%)
OverlapCache()--     3057499017 (045.75%)     2947318439 (044.10%)
OverlapCache()--     3082718850 (046.12%)     2971092730 (044.45%)
OverlapCache()--     3105061836 (046.46%)     2992089457 (044.77%)
OverlapCache()--     3127141820 (046.79%)     3012854830 (045.08%)
OverlapCache()--     3149250847 (047.12%)     3033604855 (045.39%)
OverlapCache()--     3170798144 (047.44%)     3053839195 (045.69%)
OverlapCache()--     3191931768 (047.76%)     3073690448 (045.99%)
OverlapCache()--     3213022778 (048.07%)     3093493128 (046.29%)
OverlapCache()--     3234506507 (048.40%)     3113664811 (046.59%)
OverlapCache()--     3254569577 (048.70%)     3132433487 (046.87%)
OverlapCache()--     3273906776 (048.99%)     3150504805 (047.14%)
OverlapCache()--     3291594605 (049.25%)     3166981300 (047.39%)
OverlapCache()--     3307747328 (049.49%)     3181987648 (047.61%)
OverlapCache()--     3328013181 (049.79%)     3200960433 (047.89%)
OverlapCache()--     3372713708 (050.46%)     3244327362 (048.54%)
OverlapCache()--     3420491344 (051.18%)     3290808203 (049.24%)
OverlapCache()--     3469828347 (051.92%)     3338850997 (049.96%)
OverlapCache()--     3519564661 (052.66%)     3387337946 (050.68%)
OverlapCache()--     3570452964 (053.42%)     3436962086 (051.42%)
OverlapCache()--     3621774081 (054.19%)     3487045455 (052.17%)
OverlapCache()--     3669014351 (054.90%)     3532967958 (052.86%)
OverlapCache()--     3715838209 (055.60%)     3578503947 (053.54%)
OverlapCache()--     3761002076 (056.27%)     3622393446 (054.20%)
OverlapCache()--     3807159271 (056.96%)     3667280880 (054.87%)
OverlapCache()--     3854061317 (057.67%)     3712941365 (055.55%)
OverlapCache()--     3902129680 (058.38%)     3759774907 (056.25%)
OverlapCache()--     3952548467 (059.14%)     3808979692 (056.99%)
OverlapCache()--     4002647862 (059.89%)     3857833230 (057.72%)
OverlapCache()--     4047710415 (060.56%)     3901632612 (058.38%)
OverlapCache()--     4128242465 (061.77%)     3979613064 (059.54%)
OverlapCache()--     4218577318 (063.12%)     4067068031 (060.85%)
brianwalenz commented 5 years ago

I'm guessing it was killed for exceeding the memory request of the bogart job. Increase the memory request in 4-unitigger/jobSubmit.sh, to either -mem-per-cpu 4 or -mem 128 and rerun that script to resubmit the bogart job. Bogart itself will still use (about) 62 GB memory, but you've told slurm to reserve 32 4 gb = 128 gb memory.

I'd be interested in seeing how much memory this is actually using (or tried to use), if you can figure out how to query Slurm to get it.

biowackysci commented 5 years ago

Thanks for the reply. I have now restarted the job with the following script canu -trim-assemble -p canu_trim_assemble -d /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble/ genomeSize=2.8g useGrid=true stopOnReadQuality=false minOverlapLength=1000 -pacbio-corrected /group/pasture/Saila/CANU1.8/SAILA_CANU_Corrrection/canu_correction.correctedReads.fasta.gz gridEngine=slurm gridOptions="--account=dbiopast1 --partition=batch --time=1000:00:00 --mem-per-cpu=4g"

Will let you know how it goes

Thanks S

biowackysci commented 5 years ago

Hello again, this time it ran for a while when i tried with the script above and crashed with the following report

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_172' (from '/usr/local/EasyBuild/software/Java/1.8.0_172/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 48 CPUs and 754 gigabytes of memory.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
-- 
-- Found   7 hosts with  24 cores and  755 GB memory under Slurm control.
-- Found   3 hosts with  24 cores and  377 GB memory under Slurm control.
-- Found   5 hosts with  24 cores and  503 GB memory under Slurm control.
-- Found  95 hosts with  48 cores and  754 GB memory under Slurm control.
-- Found  10 hosts with  48 cores and 1510 GB memory under Slurm control.
-- Found   5 hosts with  48 cores and 1003 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     64 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16 GB   24 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   48 GB   12 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  utgovl    24 GB   12 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16 GB    8 CPUs  (read error detection)
-- Grid:  oea        8 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      512 GB   32 CPUs  (contig construction with bogart)
-- Grid:  gfa       32 GB   32 CPUs  (GFA alignment and processing)
--
-- In 'canu_trim_assemble.seqStore', found PacBio reads:
--   Raw:        0
--   Corrected:  13803068
--   Trimmed:    9739173
--
-- Generating assembly 'canu_trim_assemble' in '/group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble'
--
-- Parameters:
--
--  genomeSize        2800000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  35317.062 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   optimizePositions()--     changed:   7648787 reads
ABORT:   optimizePositions()--   Recomputing positions, iteration 2, with 32 threads.
ABORT:   optimizePositions()--     Reset zero.
ABORT:   optimizePositions()--     Checking convergence.
ABORT:   optimizePositions()--     converged: 7641819 reads
ABORT:   optimizePositions()--     changed:   6161250 reads
ABORT:   optimizePositions()--   Recomputing positions, iteration 3, with 32 threads.
ABORT:   optimizePositions()--     Reset zero.
ABORT:   optimizePositions()--     Checking convergence.
ABORT:   optimizePositions()--     converged: 8409919 reads
ABORT:   optimizePositions()--     changed:   5393150 reads
ABORT:   optimizePositions()--   Recomputing positions, iteration 4, with 32 threads.
ABORT:   optimizePositions()--     Reset zero.
ABORT:   optimizePositions()--     Checking convergence.
ABORT:   optimizePositions()--     converged: 8690166 reads
ABORT:   optimizePositions()--     changed:   5112903 reads
ABORT:   optimizePositions()--   Recomputing positions, iteration 5, with 32 threads.
ABORT:   optimizePositions()--     Reset zero.
ABORT:   optimizePositions()--     Checking convergence.
ABORT:   optimizePositions()--     converged: 8875890 reads
ABORT:   optimizePositions()--     changed:   4927179 reads
ABORT:   optimizePositions()--   Expanding short reads with 32 threads.
ABORT:   optimizePositions()--   Updating positions.
ABORT:   optimizePositions()--   Finished.
ABORT:   
ABORT:   ==> MERGE ORPHANS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 1066682 tigs, with 32 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   findPotentialOrphans()-- working on 1066682 tigs.
ABORT:   mergeOrphans()-- Found 46470 potential orphans.
ABORT:   mergeOrphans()-- placed        7 unique orphan tigs
ABORT:   mergeOrphans()-- shattered     0 repeat orphan tigs
ABORT:   mergeOrphans()--
ABORT:   classifyAsUnassembled()--  32816 tigs    88621535 bases -- singleton
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
ABORT:   classifyAsUnassembled()--  49446 tigs   172959937 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
ABORT:   classifyAsUnassembled()-- 105391 tigs   903820499 bases -- acceptable contigs
ABORT:   
ABORT:   
ABORT:   ==> GENERATING ASSEMBLY GRAPH.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 1066682 tigs, with 32 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   AssemblyGraph()-- allocating vectors for placements, 631.854MB
ABORT:   AssemblyGraph()-- finding edges for 5216416 reads (4833637 contained), ignoring 8586652 unplaced reads, with 32 threads.
ABORT:

Can you please have a look?

Thanks S

skoren commented 5 years ago

Looks like the same error, please don't change more than one thing at a time in your command.

Originally you had been running with -account=dbiopast1 --partition=batch --time=1000:00:00 --mem-per-cpu=100g for all your jobs which means a 32-core BAT job would have tried to request 100*32 = 3.2tb of ram, which is why you got the error about no such machine being available. @brianwalenz explained the second error, you removed the option for --mem-per-cpu=100g so it seems bogart tried to use more than 62gb. The last run you did you removed batMemory but added --mem-per-cpu=4g. This limited bogart to 8*32 = 128GB but since you didn't specify batMemory it wanted to use 512 and would have requested 512 had you not overwritten it with your --mem-per-cpu=4g option. In effect, rather than doing what @brianwalenz suggested and giving bogart more memory, you gave it a quarter the memory it was requesting. In general, it's a bad idea to request memory in the gridOptions command, that will request this memory limit for all jobs you have no matter what canu would actually configure itself to use. You'll end up over-requesting memory for many jobs and under-requesting for others. There are step-specific gridOptions, run canu -options |grep -i gridOptions.

So, remove the unitigging/4-unitigger folder completely and re-run with the command:

canu -trim-assemble -p canu_trim_assemble -d /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble/ genomeSize=2.8g useGrid=true stopOnReadQuality=false minOverlapLength=1000 -pacbio-corrected /group/pasture/Saila/CANU1.8/SAILA_CANU_Corrrection/canu_correction.correctedReads.fasta.gz gridEngine=slurm batMemory=62 gridOptions="--account=dbiopast1 --partition=batch --time=1000:00:00" gridOptionsbat="--mem-per-cpu=4g". 
biowackysci commented 5 years ago

Thanks heaps. Have done what you suggested. Will keep you posted on how the run goes

Thanks S

biowackysci commented 5 years ago

so after I ran the script canu -trim-assemble -p canu_trim_assemble -d /group/pasture/Saila/CANU1.8/Saila_CANU_trim_assemble/ genomeSize=2.8g useGrid=true stopOnReadQuality=false minOverlapLength=1000 -pacbio-corrected /group/pasture/Saila/CANU1.8/SAILA_CANU_Corrrection/canu_correction.correctedReads.fasta.gz gridEngine=slurm batMemory=62 gridOptions="--account=dbiopast1 --partition=batch --time=1000:00:00" gridOptionsbat="--mem-per-cpu=4g.

I got this and the unitigging failed here

Found perl:
   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /usr/local/EasyBuild/software/Java/1.8.0_172/bin/java
   java version "1.8.0_172"

Found canu:
   /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/canu
   Canu 1.8

Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.
/var/spool/slurm/d/job576651/slurm_script: line 99: 153270 Segmentation fault      (core dumped) $bin/bogart -S ../../canu_trim_assemble.seqStore -O ../canu_trim_assemble.ovlStore -o ./canu_trim_assemble -gs 2800000000 -eg 0.045 -eM 0.045 -mo 1000 -dg 6 -db 6 -dr 3 -ca 2100 -cp 200 -threads 32 -M 62 -unassembled 2 0 1.0 0.5 3 > ./unitigger.err 2>&1
bogart appears to have failed. No canu_trim_assemble.ctgStore or canu_trim_assemble.utgStore found.
brianwalenz commented 5 years ago

What's in unitigging/4-unitigger/unitigger.err? Mostly just interested in the last 50 or so lines.

biowackysci commented 5 years ago

The last few lines are these

==> PLACE CONTAINED READS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

placeContains()-- placing 8267072 contained and 5158149 unplaced reads, with 32 threads.
placeContains()-- Placed 4696175 contained reads and 5247 unplaced reads.
placeContains()-- Failed to place 3570897 contained reads (too high error suspected) and 5152902 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 13803069 reads in 1068493 tigs, with 32 threads.
optimizePositions()--   Allocating scratch space for 13803069 reads (862691 KB).
optimizePositions()--   Initializing positions with 32 threads.
optimizePositions()--   Recomputing positions, iteration 1, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 6331271 reads
optimizePositions()--     changed:   7471798 reads
optimizePositions()--   Recomputing positions, iteration 2, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 7824466 reads
optimizePositions()--     changed:   5978603 reads
optimizePositions()--   Recomputing positions, iteration 3, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8622705 reads
optimizePositions()--     changed:   5180364 reads
optimizePositions()--   Recomputing positions, iteration 4, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8847527 reads
optimizePositions()--     changed:   4955542 reads
optimizePositions()--   Recomputing positions, iteration 5, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8995959 reads
optimizePositions()--     changed:   4807110 reads
optimizePositions()--   Expanding short reads with 32 threads.
optimizePositions()--   Updating positions.
optimizePositions()--   Finished.

==> MERGE ORPHANS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

findPotentialOrphans()-- working on 1068493 tigs.
mergeOrphans()-- Found 46794 potential orphans.
mergeOrphans()-- placed        7 unique orphan tigs
mergeOrphans()-- shattered     0 repeat orphan tigs
mergeOrphans()--
classifyAsUnassembled()--  32694 tigs    88283798 bases -- singleton
classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
classifyAsUnassembled()--  49683 tigs   173984829 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
classifyAsUnassembled()-- 105641 tigs   903536368 bases -- acceptable contigs

==> GENERATING ASSEMBLY GRAPH.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

AssemblyGraph()-- allocating vectors for placements, 631.854MB
AssemblyGraph()-- finding edges for 5079269 reads (4696175 contained), ignoring 8723799 unplaced reads, with 32 threads.
AssemblyGraph()-- building reverse edges.
AssemblyGraph()-- build complete.
AssemblyGraph()-- generating './canu_trim_assemble.initial.assembly.gfa'.
AssemblyGraph()-- Found 0 edges to unassembled contigs.
AssemblyGraph()--        0 bubble placements
AssemblyGraph()--        0 repeat placements

AssemblyGraph()-- Intratig edges:            0 contained         0 5'         0 3' (in both contig and unitig)
AssemblyGraph()-- Contig only edges:   2528791 contained   2745072 5'   2749902 3'
AssemblyGraph()-- Unitig only edges:         0 contained         0 5'         0 3'
AssemblyGraph()-- Intercontig edges:  106680339 contained  47964090 5'  49492615 3' (in neither contig nor unitig)

==> BREAK REPEATS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

Failed with '
biowackysci commented 5 years ago

Hello, Since the error just stopped with "Failed with ' , I am going again remove the unitigging/4-unitigger folder and re run the script to see if it makes any progress Thanks S

biowackysci commented 5 years ago

Hello Again, This is what in the canu-scripts/canu.29.out file

BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  30701.574 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   optimizePositions()--     Checking convergence.
ABORT:   optimizePositions()--     converged: 8995959 reads
ABORT:   optimizePositions()--     changed:   4807110 reads
ABORT:   optimizePositions()--   Expanding short reads with 32 threads.
ABORT:   optimizePositions()--   Updating positions.
ABORT:   optimizePositions()--   Finished.
ABORT:   
ABORT:   ==> MERGE ORPHANS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   findPotentialOrphans()-- working on 1068493 tigs.
ABORT:   mergeOrphans()-- Found 46794 potential orphans.
ABORT:   mergeOrphans()-- placed        7 unique orphan tigs
ABORT:   mergeOrphans()-- shattered     0 repeat orphan tigs
ABORT:   mergeOrphans()--
ABORT:   classifyAsUnassembled()--  32694 tigs    88283798 bases -- singleton
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
ABORT:   classifyAsUnassembled()--  49683 tigs   173984829 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
ABORT:   classifyAsUnassembled()-- 105641 tigs   903536368 bases -- acceptable contigs
ABORT:   
ABORT:   
ABORT:   ==> GENERATING ASSEMBLY GRAPH.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   AssemblyGraph()-- allocating vectors for placements, 631.854MB
ABORT:   AssemblyGraph()-- finding edges for 5079269 reads (4696175 contained), ignoring 8723799 unplaced reads, with 32 threads.
ABORT:   AssemblyGraph()-- building reverse edges.
ABORT:   AssemblyGraph()-- build complete.
ABORT:   AssemblyGraph()-- generating './canu_trim_assemble.initial.assembly.gfa'.
ABORT:   AssemblyGraph()-- Found 0 edges to unassembled contigs.
ABORT:   AssemblyGraph()--        0 bubble placements
ABORT:   AssemblyGraph()--        0 repeat placements
ABORT:   
ABORT:   AssemblyGraph()-- Intratig edges:            0 contained         0 5'         0 3' (in both contig and unitig)
ABORT:   AssemblyGraph()-- Contig only edges:   2528791 contained   2745072 5'   2749902 3'
ABORT:   AssemblyGraph()-- Unitig only edges:         0 contained         0 5'         0 3'
ABORT:   AssemblyGraph()-- Intercontig edges:  106680339 contained  47964090 5'  49492615 3' (in neither contig nor unitig)
ABORT:   
ABORT:   ==> BREAK REPEATS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   Failed with 'ABORT:

And the last 50 lines of the 4-unitigging/unitigger.err file 
==> BUILDING GREEDY TIGS.

breakSingletonTigs()-- Removed 880467 singleton tigs; reads are now unplaced.
optimizePositions()-- Optimizing read positions for 13803069 reads in 1068493 tigs, with 32 threads.
optimizePositions()--   Allocating scratch space for 13803069 reads (862691 KB).
optimizePositions()--   Initializing positions with 32 threads.
optimizePositions()--   Recomputing positions, iteration 1, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 9599156 reads
optimizePositions()--     changed:   4203913 reads
optimizePositions()--   Recomputing positions, iteration 2, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 9658139 reads
optimizePositions()--     changed:   4144930 reads
optimizePositions()--   Recomputing positions, iteration 3, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 9678411 reads
optimizePositions()--     changed:   4124658 reads
optimizePositions()--   Recomputing positions, iteration 4, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 9683694 reads
optimizePositions()--     changed:   4119375 reads
optimizePositions()--   Recomputing positions, iteration 5, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 9685199 reads
optimizePositions()--     changed:   4117870 reads
optimizePositions()--   Expanding short reads with 32 threads.
optimizePositions()--   Updating positions.
optimizePositions()--   Finished.

==> PLACE CONTAINED READS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

placeContains()-- placing 8267072 contained and 5158149 unplaced reads, with 32 threads.
placeContains()-- Placed 4696175 contained reads and 5247 unplaced reads.
placeContains()-- Failed to place 3570897 contained reads (too high error suspected) and 5152902 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 13803069 reads in 1068493 tigs, with 32 threads.
optimizePositions()--   Allocating scratch space for 13803069 reads (862691 KB).
optimizePositions()--   Initializing positions with 32 threads.
optimizePositions()--   Recomputing positions, iteration 1, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 6331271 reads
optimizePositions()--     changed:   7471798 reads
optimizePositions()--   Recomputing positions, iteration 2, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 7824466 reads
optimizePositions()--     changed:   5978603 reads
optimizePositions()--   Recomputing positions, iteration 3, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8622705 reads
optimizePositions()--     changed:   5180364 reads
optimizePositions()--   Recomputing positions, iteration 4, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8847527 reads
optimizePositions()--     changed:   4955542 reads
optimizePositions()--   Recomputing positions, iteration 5, with 32 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 8995959 reads
optimizePositions()--     changed:   4807110 reads
optimizePositions()--   Expanding short reads with 32 threads.
optimizePositions()--   Updating positions.
optimizePositions()--   Finished.

==> MERGE ORPHANS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

findPotentialOrphans()-- working on 1068493 tigs.
mergeOrphans()-- Found 46794 potential orphans.
mergeOrphans()-- placed        7 unique orphan tigs
mergeOrphans()-- shattered     0 repeat orphan tigs
mergeOrphans()--
classifyAsUnassembled()--  32694 tigs    88283798 bases -- singleton
classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
classifyAsUnassembled()--  49683 tigs   173984829 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
classifyAsUnassembled()-- 105641 tigs   903536368 bases -- acceptable contigs

==> GENERATING ASSEMBLY GRAPH.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

AssemblyGraph()-- allocating vectors for placements, 631.854MB
AssemblyGraph()-- finding edges for 5079269 reads (4696175 contained), ignoring 8723799 unplaced reads, with 32 threads.
AssemblyGraph()-- building reverse edges.
AssemblyGraph()-- build complete.
AssemblyGraph()-- generating './canu_trim_assemble.initial.assembly.gfa'.
AssemblyGraph()-- Found 0 edges to unassembled contigs.
AssemblyGraph()--        0 bubble placements
AssemblyGraph()--        0 repeat placements

AssemblyGraph()-- Intratig edges:            0 contained         0 5'         0 3' (in both contig and unitig)
AssemblyGraph()-- Contig only edges:   2528791 contained   2745072 5'   2749902 3'
AssemblyGraph()-- Unitig only edges:         0 contained         0 5'         0 3'
AssemblyGraph()-- Intercontig edges:  106680339 contained  47964090 5'  49492615 3' (in neither contig nor unitig)

==> BREAK REPEATS.

computeErrorProfiles()-- Computing error profiles for 1068493 tigs, with 32 threads.
computeErrorProfiles()-- Finished.

Failed with '

Again it only says Failed with ' and no specific error

Can you please help me with this ?

Thanks S

skoren commented 5 years ago

Given none of your runs are reporting an error, I'd guess your grid is not letting the final output be flushed properly to disk so you're losing the actual error. Can you get info on the failed unitigger job, there should be the job id in unitigging/4-unitigger/unitigger.jobSubmit*.out, query your grid for job resources requested and resources used before it quit.

Reserve a full node on your cluster for an interactive session (e.g. one of your 95 hosts with 754 of ram). Run sh unitigger.sh 1 by hand in the unitigging/4-unitigger/ folder, DO NOT re-launch Canu and see if you can get an error report this way.

biowackysci commented 5 years ago

Thanks Sergey, this is what i got when i ran sh unitigger.sh 1

Found perl: /usr/bin/perl This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java: /usr/local/EasyBuild/software/Java/1.8.0_172/bin/java java version "1.8.0_172"

Found canu: /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/canu /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/sqStoreCreate: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

Running job 1 based on command line options. bogart appears to have failed. No canu_trim_assemble.ctgStore or canu_trim_assemble.utgStore found.

Also, this is the file found in unitigger.jobsubmit.out Found perl: /usr/bin/perl This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java: /usr/local/EasyBuild/software/Java/1.8.0_172/bin/java java version "1.8.0_172"

Found canu: /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/canu Canu 1.8

Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0. /var/spool/slurm/d/job589812/slurm_script: line 99: 202059 Segmentation fault (core dumped) $bin/bogart -S ../../canu_trim_assemble.seqStore -O ../canu_trim_assemble.ovlStore -o ./canu_trim_assemble -gs 2800000000 -eg 0.045 -eM 0.045 -mo 1000 -dg 6 -db 6 -dr 3 -ca 2100 -cp 200 -threads 32 -M 62 -unassembled 2 0 1.0 0.5 3 > ./unitigger.err 2>&1 bogart appears to have failed. No canu_trim_assemble.ctgStore or canu_trim_assemble.utgStore found.

I am thinking it says it cannot identify the ctg.Store or the utgStore

Thanks S

skoren commented 5 years ago

No, this command is what creates the ctg and utg stores so the error is saying they weren't created because the bogart command failed. Hopefully the unitigger.err has more information in the log this time than before, does it look any different (does it show the actual error message)?

biowackysci commented 5 years ago

Hello again, i ran sh unitigger.sh 1 and this is what i got

$ sh unitigger.sh 1

Found perl: /usr/bin/perl This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java: /usr/local/EasyBuild/software/Java/1.8.0_172/bin/java java version "1.8.0_172"

Found canu: /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/canu /usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/sqStoreCreate: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

Running job 1 based on command line options. bogart appears to have failed. No canu_trim_assemble.ctgStore or canu_trim_assemble.utgStore found.

And this is the error found in unitigger.err

/usr/local/EasyBuild/software/canu/1.8-intel-2018a/bin/bogart: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

Hope this is helps

Thanks S

skoren commented 5 years ago

You've switched to a machine with a different configuration, either OS or CPU than where Canu was built. Make sure the way you're reserving the node matches the way Canu canu requests nodes in terms of flags but reserve a full node rather than part of it. Also make sure all the modules loaded during the Canu run are loaded in your interactive session.

skoren commented 5 years ago

Idle, and cluster seems to be a mix of nodes with different processors/os versions causing some issues. There have been several fixes to bogart since this so latest code might fix as well.