roland-rad-lab / MoCaSeq

Analysis pipelines for cancer genome sequencing in mice.
Other
20 stars 15 forks source link

Advice for running with Singularity #17

Closed dpcook closed 1 year ago

dpcook commented 1 year ago

Hi there,

I'm hoping to run the pipeline on a cluster without root access, so was hoping to convert the Docker image for Singularity. I believe that works fine, but I can't seem to have the pipeline work successfully. I think I just don't understand what /var/pipeline is.

singularity pull MoCaSeq.sif docker://rolandradlab/mocaseq:latest
singularity run MoCaSeq.sif --test yes

Successfully starts running the pipeline and builds the reference, but then throws the first error:

---- Finished generating reference data ----
Wed Apr 19 02:14:16 EDT 2023
DONE
mkdir: cannot create directory ‘/var/pipeline/temp/’: Read-only file system

Fastq preprocess all goes as fine, but then get an error related to the one above:

+--------------+
| CONFIG ERROR |
+--------------+
Cannot access config file (/var/pipeline/ref/GRCm38.p6/GRCm38.bammatcher_docker.conf).
It either does not exist or is not readable.

I do see the .conf file in ${working_directory}/ref/GRCm38.p6/, but not in sample output directory's pipeline subdirectory.

The rest of the pipeline runs, but outputs largely empty files.

I figure I need the equivalent of the Docker flag -v ${working_directory}:/var/pipeline/, but don't really know what that is (is var the sample name? I see the 'pipeline' subdirectory made in the output folder). From my limited knowledge of singularity, I thought the -B option gives directory permissions, but it doesn't recognize /var/, so I'm not sure how to navigate it.

Sorry if the question is naive! Any help is appreciated

NikdAK commented 1 year ago

Hi, we never used singularity for this, but I can try to help you on some of the docker-specific stuff. As a note, there is also a branch for nextflow, however this is highly experimental, not fully tested and we can not provide support for it: https://github.com/roland-rad-lab/MoCaSeq/tree/human-pipeline-nextflow

So for me it sounds like the folders are not correctly mounted/bound and the container is trying to generate new directories somewhere it should not.

"/var/pipeline/" is basically just the working/home/base directory within docker, while "/var/" in this case is just a dummy/username, but hardcoded within the pipeline. It should work if you mount/bind the volumnes using --bind. You could bind all your local folders to the requires locations within the container:

-B ${working_directory}:/var/pipeline/ # in this folder the individual folders for each sample will be generated

-B ${temp_directory}/temp/:/var/pipeline/temp/ # in this folder the temporary files will be placed and removed, this can be a scratch or some larger folder.

-B ${ref_directory}:/var/pipeline/ref/ # here the references should be. After successfully downloading it you can store it somewhere else and just bind it here

-B ${script_directory}:/opt/MoCaSeq/ # optional! like this you can easily modify the code on your machine (e.g. clone the github repo), because the local version (in ${script_directory}) will overwrite the container version (located in /opt/MoCaSeq/ within the container)

-v ${fastq_directory}:/var/pipeline/raw/ # in case you want to store the FASTQs somewhere else. You would pass the file arguments like this: -tf /var/pipeline/raw/TUMOR.R1.fastq.gz -tr /var/pipeline/raw/TUMOR.R2.fastq.gz

In docker you have "-it --entrypoint=/bin/bash", which allows you to enter the container and check the paths manually. I believe this is the singularity version: https://docs.sylabs.io/guides/3.1/user-guide/cli/singularity_shell.html singularity shell MoCaSeq.sif

Good luck!

dpcook commented 1 year ago

Thanks--I really appreciate that help! Have been modifying the bindings and seem to have got around the errors I initially saw. The only issue now seems to be something where the BAM files are going. I was wondering if you had any thoughts about what could be going wrong here:

singularity run -B ${working_directory}:/var/pipeline/ \
        -B ${working_directory}/temp:/var/pipeline/temp \
        -B ${working_directory}/ref/:/var/pipeline/ref \
        mocaseq.sif \
        --test yes

${working_directory}/ref seems to be built fine, the pipeline completes, but throws one error about not finding the BAMs, which seem to end up stored in the local ${working_directory}/temp

Specific error:

---- Matched BAM-files? ----
Tue May  2 01:31:48 EDT 2023     timestamp: 1683005508
+------------+
| FILE ERROR |
+------------+
Cannot access BAM file (/home/wranalab/dcook/software/tmp/MoCaSeq_Test/results/bam/MoCaSeq_Test.Tumor.bam).
It either does not exist or is not readable.

The directory /results/bam exists in the output, but it's empty.

Is there a missing binding or something that would perhaps explain this?

Full output:

---- Starting Mouse Cancer Genome Analysis ----
Mon May  1 10:00:56 EDT 2023     timestamp: 1682949656
---- Creating directories ----
Mon May  1 10:00:56 EDT 2023     timestamp: 1682949656
---- Checking for available reference files ----
Mon May  1 10:00:56 EDT 2023     timestamp: 1682949656
grep: /var/pipeline/ref/GRCm38.p6/GetReferenceData.txt: No such file or directory
---- Reference files not found - Files will be downloaded ----
Mon May  1 10:00:56 EDT 2023     timestamp: 1682949656
---- Get reference data ----
---- Generate reference data for Version GRCm38.p6 ----
Mon May  1 10:00:56 EDT 2023
---- Copying over files from repository ----
Mon May  1 10:00:56 EDT 2023
---- Downloading reference genome ----
Mon May  1 10:00:58 EDT 2023
---- Generate BWA Index ----
Mon May  1 10:01:36 EDT 2023
---- Generate sequence dictionary ----
Mon May  1 12:52:59 EDT 2023
---- Generate exons covered by SureSelect ----
Mon May  1 12:53:17 EDT 2023
---- Downloading reference genome (for VEP) ----
Mon May  1 12:53:20 EDT 2023
---- Generate customized Sanger DB ----
Mon May  1 12:55:29 EDT 2023
---- Generate reference data for msisensor  ----
Mon May  1 12:55:31 EDT 2023
scan -d ref/GRCm38.p6/GRCm38.p6.fna -o ref/GRCm38.p6/GRCm38.p6.microsatellites Start at:  Mon May  1 12:55:31 2023

scanning chomosome 1 done. 10 secs passed
scanning chomosome GL456210.1 done. 10 secs passed
scanning chomosome GL456211.1 done. 10 secs passed
scanning chomosome GL456212.1 done. 10 secs passed
scanning chomosome GL456213.1 done. 10 secs passed
scanning chomosome GL456221.1 done. 10 secs passed
scanning chomosome 2 done. 19 secs passed
scanning chomosome 3 done. 27 secs passed
scanning chomosome 4 done. 34 secs passed
scanning chomosome GL456216.1 done. 34 secs passed
scanning chomosome JH584292.1 done. 34 secs passed
scanning chomosome GL456350.1 done. 34 secs passed
scanning chomosome JH584293.1 done. 34 secs passed
scanning chomosome JH584294.1 done. 34 secs passed
scanning chomosome JH584295.1 done. 34 secs passed
scanning chomosome 5 done. 42 secs passed
scanning chomosome JH584296.1 done. 42 secs passed
scanning chomosome JH584297.1 done. 42 secs passed
scanning chomosome JH584298.1 done. 42 secs passed
scanning chomosome GL456354.1 done. 42 secs passed
scanning chomosome JH584299.1 done. 42 secs passed
scanning chomosome 6 done. 49 secs passed
scanning chomosome 7 done. 56 secs passed
scanning chomosome GL456219.1 done. 56 secs passed
scanning chomosome 8 done. 62 secs passed
scanning chomosome 9 done. 68 secs passed
scanning chomosome 10 done. 74 secs passed
scanning chomosome 11 done. 82 secs passed
scanning chomosome 12 done. 87 secs passed
scanning chomosome 13 done. 93 secs passed
scanning chomosome 14 done. 99 secs passed
scanning chomosome 15 done. 103 secs passed
scanning chomosome 16 done. 108 secs passed
scanning chomosome 17 done. 112 secs passed
scanning chomosome 18 done. 117 secs passed
scanning chomosome 19 done. 119 secs passed
scanning chomosome X done. 127 secs passed
scanning chomosome GL456233.1 done. 127 secs passed
scanning chomosome Y done. 131 secs passed
scanning chomosome JH584300.1 done. 131 secs passed
scanning chomosome JH584301.1 done. 131 secs passed
scanning chomosome JH584302.1 done. 131 secs passed
scanning chomosome JH584303.1 done. 131 secs passed
scanning chomosome GL456239.1 done. 131 secs passed
scanning chomosome GL456367.1 done. 131 secs passed
scanning chomosome GL456378.1 done. 131 secs passed
scanning chomosome GL456381.1 done. 131 secs passed
scanning chomosome GL456382.1 done. 131 secs passed
scanning chomosome GL456383.1 done. 131 secs passed
scanning chomosome GL456385.1 done. 131 secs passed
scanning chomosome GL456390.1 done. 131 secs passed
scanning chomosome GL456392.1 done. 131 secs passed
scanning chomosome GL456393.1 done. 131 secs passed
scanning chomosome GL456394.1 done. 131 secs passed
scanning chomosome GL456359.1 done. 131 secs passed
scanning chomosome GL456360.1 done. 131 secs passed
scanning chomosome GL456396.1 done. 131 secs passed
scanning chomosome GL456372.1 done. 131 secs passed
scanning chomosome GL456387.1 done. 131 secs passed
scanning chomosome GL456389.1 done. 131 secs passed
scanning chomosome GL456370.1 done. 131 secs passed
scanning chomosome GL456379.1 done. 131 secs passed
scanning chomosome GL456366.1 done. 131 secs passed
scanning chomosome GL456368.1 done. 131 secs passed
scanning chomosome JH584304.1 done. 131 secs passed
scanning chomosome KV575234.1 done. 131 secs passed
scanning chomosome KK082441.1 done. 131 secs passed
scanning chomosome KV575232.1 done. 131 secs passed
scanning chomosome KV575233.1 done. 131 secs passed
scanning chomosome KV575235.1 done. 131 secs passed
scanning chomosome KV575236.1 done. 131 secs passed
scanning chomosome KQ030484.1 done. 131 secs passed
scanning chomosome KZ289066.1 done. 131 secs passed
scanning chomosome KZ289065.1 done. 132 secs passed
scanning chomosome KZ289064.1 done. 132 secs passed
scanning chomosome KQ030485.1 done. 132 secs passed
scanning chomosome KQ030486.1 done. 132 secs passed
scanning chomosome KZ289070.1 done. 132 secs passed
scanning chomosome KZ289067.1 done. 132 secs passed
scanning chomosome KQ030487.1 done. 132 secs passed
scanning chomosome KQ030488.1 done. 132 secs passed
scanning chomosome KQ030489.1 done. 132 secs passed
scanning chomosome KZ289069.1 done. 132 secs passed
scanning chomosome JH792826.1 done. 132 secs passed
scanning chomosome KZ289068.1 done. 132 secs passed
scanning chomosome JH792827.1 done. 132 secs passed
scanning chomosome KV575237.1 done. 132 secs passed
scanning chomosome KK082443.1 done. 132 secs passed
scanning chomosome KK082442.1 done. 132 secs passed
scanning chomosome JH792828.1 done. 132 secs passed
scanning chomosome KV575238.1 done. 132 secs passed
scanning chomosome KV575239.1 done. 132 secs passed
scanning chomosome KV575240.1 done. 132 secs passed
scanning chomosome KQ030490.1 done. 132 secs passed
scanning chomosome KB469738.3 done. 132 secs passed
scanning chomosome KZ289071.1 done. 132 secs passed
scanning chomosome KQ030491.1 done. 132 secs passed
scanning chomosome KZ289072.1 done. 132 secs passed
scanning chomosome KZ289076.1 done. 132 secs passed
scanning chomosome KZ289073.1 done. 132 secs passed
scanning chomosome KZ289077.1 done. 132 secs passed
scanning chomosome KZ289080.1 done. 132 secs passed
scanning chomosome KZ289078.1 done. 132 secs passed
scanning chomosome KZ289074.1 done. 132 secs passed
scanning chomosome KZ289081.1 done. 132 secs passed
scanning chomosome KZ289079.1 done. 132 secs passed
scanning chomosome KZ289075.1 done. 132 secs passed
scanning chomosome KB469739.1 done. 132 secs passed
scanning chomosome KB469740.1 done. 132 secs passed
scanning chomosome KZ289082.1 done. 132 secs passed
scanning chomosome KZ289083.1 done. 132 secs passed
scanning chomosome KZ289084.1 done. 132 secs passed
scanning chomosome KQ030492.1 done. 132 secs passed
scanning chomosome KV575241.1 done. 132 secs passed
scanning chomosome KQ030493.2 done. 132 secs passed
scanning chomosome KZ289086.1 done. 132 secs passed
scanning chomosome KZ289085.1 done. 132 secs passed
scanning chomosome KB469741.2 done. 132 secs passed
scanning chomosome KZ289087.1 done. 132 secs passed
scanning chomosome KZ289088.1 done. 132 secs passed
scanning chomosome KZ289089.1 done. 132 secs passed
scanning chomosome KB469742.1 done. 132 secs passed
scanning chomosome JH792829.1 done. 132 secs passed
scanning chomosome KZ289090.1 done. 132 secs passed
scanning chomosome KZ289091.1 done. 132 secs passed
scanning chomosome JH792830.1 done. 132 secs passed
scanning chomosome KQ030494.1 done. 132 secs passed
scanning chomosome KV575242.1 done. 132 secs passed
scanning chomosome KZ289093.1 done. 132 secs passed
scanning chomosome KQ030496.1 done. 132 secs passed
scanning chomosome KQ030497.1 done. 132 secs passed
scanning chomosome JH792831.2 done. 132 secs passed
scanning chomosome KZ289094.1 done. 132 secs passed
scanning chomosome KQ030495.1 done. 132 secs passed
scanning chomosome KZ289095.1 done. 132 secs passed
scanning chomosome KZ289092.1 done. 133 secs passed
scanning chomosome JH792832.1 done. 133 secs passed
scanning chomosome JH792834.1 done. 133 secs passed
scanning chomosome JH792833.1 done. 133 secs passed
scanning chomosome GL456079.1 done. 133 secs passed
scanning chomosome GL456024.2 done. 133 secs passed
scanning chomosome GL456007.1 done. 133 secs passed
scanning chomosome GL456006.1 done. 133 secs passed
scanning chomosome GL456008.1 done. 133 secs passed
scanning chomosome GL456011.1 done. 133 secs passed
scanning chomosome GL456025.1 done. 133 secs passed
scanning chomosome GL456026.2 done. 133 secs passed
scanning chomosome GL456014.1 done. 133 secs passed
scanning chomosome GL456074.1 done. 133 secs passed
scanning chomosome GL456017.2 done. 133 secs passed
scanning chomosome JH584305.1 done. 133 secs passed
scanning chomosome GL456019.1 done. 133 secs passed
scanning chomosome JH584306.1 done. 133 secs passed
scanning chomosome JH584307.1 done. 133 secs passed
scanning chomosome JH590470.1 done. 133 secs passed
scanning chomosome JH584308.1 done. 133 secs passed
scanning chomosome JH584309.1 done. 133 secs passed
scanning chomosome GL456032.1 done. 133 secs passed
scanning chomosome GL456033.2 done. 133 secs passed
scanning chomosome GL456031.1 done. 133 secs passed
scanning chomosome GL456013.1 done. 133 secs passed
scanning chomosome GL455993.1 done. 133 secs passed
scanning chomosome GL455991.1 done. 133 secs passed
scanning chomosome GL455992.2 done. 133 secs passed
scanning chomosome GL455994.1 done. 133 secs passed
scanning chomosome GL455995.1 done. 133 secs passed
scanning chomosome GL455996.1 done. 133 secs passed
scanning chomosome GL455997.1 done. 133 secs passed
scanning chomosome GL455998.1 done. 133 secs passed
scanning chomosome GL455999.2 done. 133 secs passed
scanning chomosome GL456000.1 done. 133 secs passed
scanning chomosome JH584310.1 done. 133 secs passed
scanning chomosome JH584311.1 done. 133 secs passed
scanning chomosome JH584312.1 done. 133 secs passed
scanning chomosome GL456001.2 done. 133 secs passed
scanning chomosome JH584313.1 done. 133 secs passed
scanning chomosome JH584314.1 done. 133 secs passed
scanning chomosome GL456021.2 done. 133 secs passed
scanning chomosome GL456002.2 done. 133 secs passed
scanning chomosome GL456003.2 done. 133 secs passed
scanning chomosome GL456004.1 done. 133 secs passed
scanning chomosome JH584315.1 done. 133 secs passed
scanning chomosome GL456005.1 done. 133 secs passed
scanning chomosome GL456010.1 done. 133 secs passed
scanning chomosome GL456012.2 done. 133 secs passed
scanning chomosome GL456065.1 done. 133 secs passed
scanning chomosome GL456016.1 done. 133 secs passed
scanning chomosome GL456020.1 done. 133 secs passed
scanning chomosome GL456028.2 done. 133 secs passed
scanning chomosome GL456022.2 done. 133 secs passed
scanning chomosome GL455990.1 done. 133 secs passed
scanning chomosome GL456080.1 done. 133 secs passed
scanning chomosome GL456081.1 done. 133 secs passed
scanning chomosome GL456082.1 done. 133 secs passed
scanning chomosome GL455989.1 done. 133 secs passed
scanning chomosome GL456068.1 done. 133 secs passed
scanning chomosome JH584316.1 done. 133 secs passed
scanning chomosome JH584317.1 done. 133 secs passed
scanning chomosome JH584318.1 done. 133 secs passed
scanning chomosome GL456071.1 done. 133 secs passed
scanning chomosome GL456072.1 done. 133 secs passed
scanning chomosome GL456073.1 done. 133 secs passed
scanning chomosome JH584320.1 done. 134 secs passed
scanning chomosome JH584321.1 done. 134 secs passed
scanning chomosome JH584322.1 done. 134 secs passed
scanning chomosome GL456048.1 done. 134 secs passed
scanning chomosome GL456045.2 done. 134 secs passed
scanning chomosome GL456049.2 done. 134 secs passed
scanning chomosome JH584323.1 done. 134 secs passed
scanning chomosome GL456044.2 done. 134 secs passed
scanning chomosome GL456042.2 done. 134 secs passed
scanning chomosome JH584324.1 done. 134 secs passed
scanning chomosome JH584325.1 done. 134 secs passed
scanning chomosome GL456053.2 done. 134 secs passed
scanning chomosome JH584326.1 done. 134 secs passed
scanning chomosome JH584327.1 done. 134 secs passed
scanning chomosome GL456060.2 done. 134 secs passed
scanning chomosome JH584328.1 done. 135 secs passed
scanning chomosome GL456050.1 done. 135 secs passed
scanning chomosome JH584264.1 done. 135 secs passed
scanning chomosome GL456054.2 done. 135 secs passed
scanning chomosome JH584265.1 done. 135 secs passed
scanning chomosome JH584267.1 done. 135 secs passed
scanning chomosome JH584266.1 done. 135 secs passed
scanning chomosome GL456349.1 done. 135 secs passed
scanning chomosome GL456070.1 done. 135 secs passed
scanning chomosome GL456064.1 done. 135 secs passed
scanning chomosome GL456009.1 done. 135 secs passed
scanning chomosome GL456015.1 done. 135 secs passed
scanning chomosome GL456069.1 done. 135 secs passed
scanning chomosome JH584319.1 done. 135 secs passed
scanning chomosome JH584269.1 done. 135 secs passed
scanning chomosome JH584268.1 done. 135 secs passed
scanning chomosome GL456077.1 done. 135 secs passed
scanning chomosome GL456075.1 done. 135 secs passed
scanning chomosome GL456076.1 done. 135 secs passed
scanning chomosome GL456078.1 done. 135 secs passed
scanning chomosome JH584270.1 done. 135 secs passed
scanning chomosome AY172335.1 done. 135 secs passed

Total time consumed:  178 secs

---- Optional for WES: Generating reference data for CopywriteR ----
Mon May  1 12:58:29 EDT 2023
The output folder ‘/home/wranalab/dcook/software/tmp/ref/
  GRCm38.p6’ has been detected 
Generated GC-content and mappability data at 10000 bp resolution... 
Generated blacklist file... 
The output folder ‘/home/wranalab/dcook/software/tmp/ref/
  GRCm38.p6’ has been detected 
Generated GC-content and mappability data at 20000 bp resolution... 
Generated blacklist file... 
The output folder ‘/home/wranalab/dcook/software/tmp/ref/
  GRCm38.p6’ has been detected 
Generated GC-content and mappability data at 50000 bp resolution... 
Generated blacklist file... 
The output folder ‘/home/wranalab/dcook/software/tmp/ref/
  GRCm38.p6’ has been detected 
Generated GC-content and mappability data at 100000 bp resolution... 
Generated blacklist file... 
---- Optional for WGS: Generating reference data for HMMCopy ----
Mon May  1 12:59:19 EDT 2023
Settings:
  Output files: "ref/GRCm38.p6/GRCm38.p6.fna.*.ebwt"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 5 (one in 32)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  ref/GRCm38.p6/GRCm38.p6.fna
Reading reference sizes
  Time reading reference sizes: 00:00:22
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:16
bmax according to bmaxDivN setting: 684884673
Using parameters --bmax 513663505 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 513663505 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:38
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:19
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:36
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 2.73954e+09 (target: 513663504)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 2739538695 for bucket 1
  (Using difference cover)
  Sorting block time: 00:29:58
Returning block of 2739538696 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 797648680
fchr[G]: 1369186610
fchr[T]: 1940852479
fchr[$]: 2739538695
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 786957204 bytes to primary EBWT file: ref/GRCm38.p6/GRCm38.p6.fna.1.ebwt
Wrote 342442344 bytes to secondary EBWT file: ref/GRCm38.p6/GRCm38.p6.fna.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2739538695
    bwtLen: 2739538696
    sz: 684884674
    bwtSz: 684884674
    lineRate: 6
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 85610585
    offsSz: 342442340
    isaLen: 0
    isaSz: 0
    lineSz: 64
    sideSz: 64
    sideBwtSz: 56
    sideBwtLen: 224
    numSidePairs: 6115042
    numSides: 12230084
    numLines: 12230084
    ebwtTotLen: 782725376
    ebwtTotSz: 782725376
    reverse: 0
Total time for call to driver() for forward index: 00:37:14
Reading reference sizes
  Time reading reference sizes: 00:00:19
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:17
bmax according to bmaxDivN setting: 684884673
Using parameters --bmax 513663505 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 513663505 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:38
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:19
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:35
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 2.73954e+09 (target: 513663504)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 2739538695 for bucket 1
  (Using difference cover)
  Sorting block time: 00:29:58
Returning block of 2739538696 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 797648680
fchr[G]: 1369186610
fchr[T]: 1940852479
fchr[$]: 2739538695
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 786957204 bytes to primary EBWT file: ref/GRCm38.p6/GRCm38.p6.fna.rev.1.ebwt
Wrote 342442344 bytes to secondary EBWT file: ref/GRCm38.p6/GRCm38.p6.fna.rev.2.ebwt
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2739538695
    bwtLen: 2739538696
    sz: 684884674
    bwtSz: 684884674
    lineRate: 6
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 85610585
    offsSz: 342442340
    isaLen: 0
    isaSz: 0
    lineSz: 64
    sideSz: 64
    sideBwtSz: 56
    sideBwtLen: 224
    numSidePairs: 6115042
    numSides: 12230084
    numLines: 12230084
    ebwtTotLen: 782725376
    ebwtTotSz: 782725376
    reverse: 0
Total time for backward call to driver() for mirror index: 00:37:12
---- Finished generating reference data ----
Tue May  2 01:26:53 EDT 2023
DONE
---- Starting Mouse Cancer Genome Analysis ----
Starting pipeline using these settings:
Tue May  2 01:26:53 EDT 2023     timestamp: 1683005213
Running sample named MoCaSeq_Test
Running in MS-mode
Using /opt/MoCaSeq/test/Mouse.Normal.R1.fastq.gz and /opt/MoCaSeq/test/Mouse.Normal.R2.fastq.gz for normal fastqs
Using /opt/MoCaSeq/test/Mouse.Tumor.R1.fastq.gz and /opt/MoCaSeq/test/Mouse.Tumor.R2.fastq.gz for tumor fastqs
Assuming that reads are from Mouse
Assuming that experiment is WES
Reading configuration file from /opt/MoCaSeq/config.sh
Setting location of repository to /opt/MoCaSeq/repository
Setting location of genome to /var/pipeline/ref/GRCm38.p6
Setting location for temporary files to /var/pipeline/temp
Assuming none-artefacts for SNV-calling
all is setting for filtering of SNV calls
Quality scores are assumed as
Using GATK v4.1.7.0
Will run Mutect2
Starting workflow using 4 CPU-threads and 8 GB of RAM
---- Creating directories ----
Tue May  2 01:26:54 EDT 2023     timestamp: 1683005214
---- Copying repository ----
Tue May  2 01:26:54 EDT 2023     timestamp: 1683005214
---- Copying raw data ----
Tue May  2 01:26:54 EDT 2023     timestamp: 1683005214
---- Calculating md5-sums ----
Tue May  2 01:26:57 EDT 2023     timestamp: 1683005217
---- Running FastQC before trimming ----
Tue May  2 01:26:57 EDT 2023     timestamp: 1683005217
Analysis complete for MoCaSeq_Test.Normal.R1.fastq.gz
Analysis complete for MoCaSeq_Test.Tumor.R1.fastq.gz
Analysis complete for MoCaSeq_Test.Normal.R2.fastq.gz
Analysis complete for MoCaSeq_Test.Tumor.R2.fastq.gz
---- Trimming reads ----
Tue May  2 01:27:10 EDT 2023     timestamp: 1683005230
---- Running FastQC after trimming ----
Tue May  2 01:27:35 EDT 2023     timestamp: 1683005255
Analysis complete for MoCaSeq_Test.Tumor.R1.passed.fastq.gz
Analysis complete for MoCaSeq_Test.Normal.R1.passed.fastq.gz
Analysis complete for MoCaSeq_Test.Tumor.R2.passed.fastq.gz
Analysis complete for MoCaSeq_Test.Normal.R2.passed.fastq.gz
---- Removing fastq files ----
Tue May  2 01:27:46 EDT 2023     timestamp: 1683005266
---- Mapping trimmed reads ----
Tue May  2 01:27:46 EDT 2023     timestamp: 1683005266
---- Postprocessing I (Sorting, fixing read groups and marking duplicates) ----
Tue May  2 01:30:33 EDT 2023     timestamp: 1683005433
---- Postprocessing II (Base recalibration) ----
Tue May  2 01:31:28 EDT 2023     timestamp: 1683005488
---- Quality control I (Sequencing artifacts, multiple metrics) ----
Tue May  2 01:31:33 EDT 2023     timestamp: 1683005493
---- Quality control II (WES- or WGS-specific metrics) ----
Tue May  2 01:31:34 EDT 2023     timestamp: 1683005494
---- Summarizing quality control data ----
Tue May  2 01:31:35 EDT 2023     timestamp: 1683005495
---- Matched BAM-files? ----
Tue May  2 01:31:48 EDT 2023     timestamp: 1683005508
+------------+
| FILE ERROR |
+------------+
Cannot access BAM file (/home/wranalab/dcook/software/tmp/MoCaSeq_Test/results/bam/MoCaSeq_Test.Tumor.bam).
It either does not exist or is not readable.

---- Get genotypes ----
Tue May  2 01:31:49 EDT 2023     timestamp: 1683005509
---- Running Manta (matched tumor-normal) ----
Tue May  2 01:31:55 EDT 2023     timestamp: 1683005515
---- Running Strelka (matched tumor-normal) ----
Tue May  2 01:32:50 EDT 2023     timestamp: 1683005570
---- Strelka Postprocessing I (Indel size selection, filtering) ----
Tue May  2 01:32:51 EDT 2023     timestamp: 1683005571
---- Strelka Postprocessing II (Filtering out known SNV/Indel using dbSNP or the Sanger Mouse database) ----
Tue May  2 01:32:54 EDT 2023     timestamp: 1683005574
---- Strelka Postprocessing III (Extracting allele frequencies) ----
Tue May  2 01:32:54 EDT 2023     timestamp: 1683005574
---- Strelka Postprocessing IV (Annotate calls) ----
Tue May  2 01:32:54 EDT 2023     timestamp: 1683005574
---- Running Mutect2 (matched tumor-normal) ----
Tue May  2 01:33:14 EDT 2023     timestamp: 1683005594
---- Mutect2 Postprocessing (matched tumor-normal) ----
Tue May  2 01:33:17 EDT 2023     timestamp: 1683005597
---- Mutect2 Postprocessing I (OrientationFilter, Indel size selection, filtering) ----
Tue May  2 01:33:17 EDT 2023     timestamp: 1683005597
---- Mutect2 Postprocessing II (Filtering out known SNV/Indel using dbSNP or the Sanger Mouse database) ----
Tue May  2 01:33:24 EDT 2023     timestamp: 1683005604
---- Mutect2 Postprocessing III (Annotate calls) ----
Tue May  2 01:33:24 EDT 2023     timestamp: 1683005604
---- Running Mutect2 (single-sample) ----
Tue May  2 01:33:36 EDT 2023     timestamp: 1683005616
---- Mutect2 SS Postprocessing I (OrientationFilter, Indel size selection, filtering) ----
Tue May  2 01:33:39 EDT 2023     timestamp: 1683005619
---- Mutect2 SS Postprocessing II (Filtering out known SNV/Indel using dbSNP or the Sanger Mouse database) ----
Tue May  2 01:33:44 EDT 2023     timestamp: 1683005624
---- Mutect2 SS Postprocessing III (Annotate calls) ----
Tue May  2 01:33:44 EDT 2023     timestamp: 1683005624
---- Mutect2 SS Postprocessing I (OrientationFilter, Indel size selection, filtering) ----
Tue May  2 01:33:47 EDT 2023     timestamp: 1683005627
---- Mutect2 SS Postprocessing II (Filtering out known SNV/Indel using dbSNP or the Sanger Mouse database) ----
Tue May  2 01:33:52 EDT 2023     timestamp: 1683005632
---- Mutect2 SS Postprocessing III (Annotate calls) ----
Tue May  2 01:33:53 EDT 2023     timestamp: 1683005633
---- Generate LOH data ----
Tue May  2 01:33:54 EDT 2023     timestamp: 1683005634
---- Generate and plot copy number data ----
Tue May  2 01:34:00 EDT 2023     timestamp: 1683005640
---- Run CopywriteR ----
Tue May  2 01:34:00 EDT 2023     timestamp: 1683005640
---- Export raw data and re-normalize using Mode ----
Tue May  2 01:34:08 EDT 2023     timestamp: 1683005648
Collect Segment data
Calculate Mode correction for Sample MoCaSeq_Test
---- Plot CNV-profiles ----
Tue May  2 01:34:09 EDT 2023     timestamp: 1683005649
---- Run HMMCopy (bin-size 20000) ----
Tue May  2 01:34:15 EDT 2023     timestamp: 1683005655
Binning read counts in Tumor file @ 20000 resolution...
Binning read counts in Normal file @ 20000 resolution...
---- Plot HMMCopy ----
Tue May  2 01:34:15 EDT 2023     timestamp: 1683005655
---- Run msisensor----
Tue May  2 01:34:21 EDT 2023     timestamp: 1683005661
msi -n MoCaSeq_Test/results/bam/MoCaSeq_Test.Normal.bam -t MoCaSeq_Test/results/bam/MoCaSeq_Test.Tumor.bam -o MoCaSeq_Test/results/msisensor/MoCaSeq_Test.msisensor -d /var/pipeline/ref/GRCm38.p6/GRCm38.p6.microsatellites -b 4 Start at:  Tue May  2 01:34:21 2023

loading homopolymer and microsatellite sites ...
---- Finished analysis of sample MoCaSeq_Test ----
Tue May  2 01:35:01 EDT 2023     timestamp: 1683005701

And tree of output directory:

.
├── fastq
│   ├── MoCaSeq_Test.Normal.R1.fastq.gz.md5
│   ├── MoCaSeq_Test.Normal.R2.fastq.gz.md5
│   ├── MoCaSeq_Test.Tumor.R1.fastq.gz.md5
│   └── MoCaSeq_Test.Tumor.R2.fastq.gz.md5
├── pipeline
│   ├── config.sh
│   ├── MoCaSeq.sh
│   └── repository
│       ├── all_DeterminePhred.sh
│       ├── all_GeneratePlots.R
│       ├── all_MoCaSeq_lcWGS.sh
│       ├── all_RunTitanCNA.R
│       ├── all_RunTitanCNA.sh
│       ├── all_TitanCNA.R
│       ├── all_TitanCNASolution.R
│       ├── Chromothripsis_AnnotateRatios.R
│       ├── Chromothripsis_DetectBreakpointClustering.R
│       ├── Chromothripsis_DetectRandomJoins.R
│       ├── Chromothripsis_FilterDelly.R
│       ├── Chromothripsis_FormatTable.sh
│       ├── Chromothripsis_GetCoverage.sh
│       ├── Chromothripsis_PlotLOHPattern.R
│       ├── Chromothripsis_PlotRearrangementGraph.R
│       ├── Chromothripsis_RearrangementCounter.R
│       ├── Chromothripsis_SimulateCopyNumberStates.R
│       ├── Chromothripsis_WalkDerivativeChromosome.R
│       ├── CNV_CleanUp.sh
│       ├── CNV_CopywriterGetModeCorrectionFactor.py
│       ├── CNV_CopywriterGetModeCorrectionFactor.R
│       ├── CNV_CopywriterGetRawData.R
│       ├── CNV_EstimateCoverage.R
│       ├── CNV_GetGenotype.R
│       ├── CNV_GetGenotype.sh
│       ├── CNV_MapSegmentsToGenes.R
│       ├── CNV_PlotaCGH.R
│       ├── CNV_PlotCopywriter.R
│       ├── CNV_PlotHMMCopy.R
│       ├── CNV_RunCopywriter.R
│       ├── CNV_RunHMMCopy.sh
│       ├── Cohort_CompareSNPs.r
│       ├── Cohort_CopyResults.sh
│       ├── Cohort_GenerateOverlayLibrary.R
│       ├── Cohort_GenerateOverlay.R
│       ├── Cohort_GetQC.R
│       ├── LOH_CNVKitPrepareLOH.sh
│       ├── LOH_GenerateVariantTable.R
│       ├── LOH_Library.R
│       ├── LOH_MakePlots.R
│       ├── LOH_MapSegmentsToGenes.R
│       ├── Meta_logstats.sh
│       ├── Preparation_DownloadFromENA.sh
│       ├── Preparation_GenerateBWAIndex.sh
│       ├── Preparation_GenerateCopywriterReferences.R
│       ├── Preparation_GenerateSangerMouseDB.sh
│       ├── Preparation_GetExemplaryData.sh
│       ├── Preparation_GetReferenceDataMouse.sh
│       ├── SNV_CleanUp.sh
│       ├── SNV_GenerateCohortDB.sh
│       ├── SNV_GetGenotype.sh
│       ├── SNV_Mutect2Postprocessing.sh
│       ├── SNV_Mutect2PostprocessingSS.sh
│       ├── SNV_RunVEP.sh
│       ├── SNV_SelectOutput.R
│       ├── SNV_SelectOutputSS.R
│       ├── SNV_Signatures.R
│       ├── SNV_StrelkaPostprocessing.sh
│       ├── SV_MantaPostprocessing.sh
│       └── SV_SelectGenesFromManta.R
└── results
    ├── bam
    ├── Copywriter
    │   └── MoCaSeq_Test_Chromosomes
    ├── Genotype
    │   ├── MoCaSeq_Test.Genotypes.temp.SNV.txt
    │   └── MoCaSeq_Test.Genotypes.txt
    ├── HMMCopy
    │   ├── MoCaSeq_Test.Normal.20000.wig
    │   └── MoCaSeq_Test.Tumor.20000.wig
    ├── LOH
    │   ├── MoCaSeq_Test_Chromosomes
    │   └── MoCaSeq_Test.VariantsForLOHGermline.txt
    ├── Manta
    │   ├── MoCaSeq_Test.Manta.txt
    │   ├── MoCaSeq_Test.Manta.vcf
    │   ├── MoCaSeq_Test.Manta.vcf.gz.stats
    │   ├── MoCaSeq_Test.Manta.vep.maf
    │   └── MoCaSeq_Test.Manta.vep.vcf
    ├── msisensor
    │   ├── MoCaSeq_Test.msisensor
    │   ├── MoCaSeq_Test.msisensor_dis
    │   ├── MoCaSeq_Test.msisensor_germline
    │   └── MoCaSeq_Test.msisensor_somatic
    ├── Mutect2
    │   ├── MoCaSeq_Test.Mutect2.NoCommonSNPs.OnlyImpact.CGC.txt
    │   ├── MoCaSeq_Test.Mutect2.NoCommonSNPs.OnlyImpact.txt
    │   ├── MoCaSeq_Test.Mutect2.txt
    │   ├── MoCaSeq_Test.Mutect2.vep.maf
    │   ├── MoCaSeq_Test.Normal.Mutect2.NoCommonSNPs.OnlyImpact.CGC.txt
    │   ├── MoCaSeq_Test.Normal.Mutect2.NoCommonSNPs.OnlyImpact.txt
    │   ├── MoCaSeq_Test.Normal.Mutect2.Positions.txt
    │   ├── MoCaSeq_Test.Normal.Mutect2.txt
    │   ├── MoCaSeq_Test.Normal.Mutect2.vep.maf
    │   ├── MoCaSeq_Test.Normal.Mutect2.vep.tmp
    │   ├── MoCaSeq_Test.Tumor.Mutect2.NoCommonSNPs.OnlyImpact.CGC.txt
    │   ├── MoCaSeq_Test.Tumor.Mutect2.NoCommonSNPs.OnlyImpact.txt
    │   ├── MoCaSeq_Test.Tumor.Mutect2.Positions.txt
    │   ├── MoCaSeq_Test.Tumor.Mutect2.txt
    │   └── MoCaSeq_Test.Tumor.Mutect2.vep.tmp
    ├── QC
    │   ├── MoCaSeq_Test_data
    │   │   ├── mqc_fastqc_adapter_content_plot_1.txt
    │   │   ├── mqc_fastqc_per_base_n_content_plot_1.txt
    │   │   ├── mqc_fastqc_per_base_sequence_quality_plot_1.txt
    │   │   ├── mqc_fastqc_per_sequence_gc_content_plot_Counts.txt
    │   │   ├── mqc_fastqc_per_sequence_gc_content_plot_Percentages.txt
    │   │   ├── mqc_fastqc_per_sequence_quality_scores_plot_1.txt
    │   │   ├── mqc_fastqc_sequence_counts_plot_1.txt
    │   │   ├── mqc_fastqc_sequence_duplication_levels_plot_1.txt
    │   │   ├── mqc_fastqc_sequence_length_distribution_plot_1.txt
    │   │   ├── mqc_trimmomatic_plot_1.txt
    │   │   ├── multiqc_data.json
    │   │   ├── multiqc_fastqc.txt
    │   │   ├── multiqc_general_stats.txt
    │   │   ├── multiqc.log
    │   │   ├── multiqc_sources.txt
    │   │   └── multiqc_trimmomatic.txt
    │   ├── MoCaSeq_Test.html
    │   ├── MoCaSeq_Test.Normal.bam.idxstats
    │   ├── MoCaSeq_Test.Normal.R1_fastqc.html
    │   ├── MoCaSeq_Test.Normal.R1_fastqc.zip
    │   ├── MoCaSeq_Test.Normal.R1.passed_fastqc.html
    │   ├── MoCaSeq_Test.Normal.R1.passed_fastqc.zip
    │   ├── MoCaSeq_Test.Normal.R2_fastqc.html
    │   ├── MoCaSeq_Test.Normal.R2_fastqc.zip
    │   ├── MoCaSeq_Test.Normal.R2.passed_fastqc.html
    │   ├── MoCaSeq_Test.Normal.R2.passed_fastqc.zip
    │   ├── MoCaSeq_Test.report.txt
    │   ├── MoCaSeq_Test.Tumor.bam.idxstats
    │   ├── MoCaSeq_Test.Tumor.R1_fastqc.html
    │   ├── MoCaSeq_Test.Tumor.R1_fastqc.zip
    │   ├── MoCaSeq_Test.Tumor.R1.passed_fastqc.html
    │   ├── MoCaSeq_Test.Tumor.R1.passed_fastqc.zip
    │   ├── MoCaSeq_Test.Tumor.R2_fastqc.html
    │   ├── MoCaSeq_Test.Tumor.R2_fastqc.zip
    │   ├── MoCaSeq_Test.Tumor.R2.passed_fastqc.html
    │   └── MoCaSeq_Test.Tumor.R2.passed_fastqc.zip
    └── Strelka
        ├── MoCaSeq_Test.Strelka.NoCommonSNPs.OnlyImpact.CGC.txt
        ├── MoCaSeq_Test.Strelka.NoCommonSNPs.OnlyImpact.txt
        ├── MoCaSeq_Test.Strelka.txt
        ├── MoCaSeq_Test.Strelka.vcf
        └── MoCaSeq_Test.Strelka.vep.maf

17 directories, 136 files
NikdAK commented 1 year ago

I am not sure what is going on with the working_directory. I guess it is working_directory=/home/wranalab/dcook/software/tmp/ ? You are mounting this as working directory (/var/pipeline/) as well as for the reference (/var/pipeline/ref), however I can not see the reference anywhere in the directory output you provided. There should be a folder called GRCm38.p6 at the location you mounted to /var/pipeline/ref.

If the BAM files are not found the pipeline should throw a lot more errors. The tree output is for the $working_directory? Is there something in the following file? --> MoCaSeq_Test.Mutect2.txt

dpcook commented 1 year ago

Apologies, yes, set working_directory="/home/wranalab/dcook/software/tmp" in the shell script. Tried to avoid the confusion of mentioning that I messing around in a vaguely named directory.

There should be a folder called GRCm38.p6 at the location you mounted to /var/pipeline/ref.

Yes, there seems to be a fully populated directory in ${working_directory}/ref (full contents printed below)

Is there something in the following file? --> MoCaSeq_Test.Mutect2.txt

Yes, it appears in ${working_directory}/MoCaSeq_Test/results/Mutect2/MoCaSeq_Test.Mutect2.txt, but it only contains the column names:

CHROM   POS     REF     ALT     GEN[Tumor].AF   GEN[Tumor].AD[0]        GEN[Tumor].AD[1]        GEN[Normal].AD[0]       GEN[Normal].AD[1]       ANN[*].GENE     ANN[*].EFFECT   ANN[*].IMPACT   ANN[*].FEATUREID        ANN[*].HGVS_C   ANN[*].HGVS_P

The tree output is for the $working_directory?

Sorry, the tree output was for ${working_directory}/MoCaSeq_Test specifically. I didn't show $working_directory because of the number of files in /ref.

Contents of ${working_directory}:

[dcook@galenus tmp]$ pwd
/home/wranalab/dcook/software/tmp
[dcook@galenus tmp]$ ls -shlt
total 4.1G
4.0K -rw-r--r-- 1 dcook wranalab  357 May  2 09:46 mocaseq_test.sh
 28K -rw-r--r-- 1 dcook wranalab  27K May  2 01:35 slurm-2090262.out
4.0K drwxr-sr-x 2 dcook wranalab 4.0K May  2 01:31 temp
4.0K drwxr-sr-x 3 dcook wranalab 4.0K May  1 10:00 ref
4.0K drwxr-sr-x 5 dcook wranalab 4.0K May  1 10:00 MoCaSeq_Test
4.1G -rwxr-xr-x 1 dcook wranalab 4.1G May  1 10:00 mocaseq.sif
4.0K -rw-r--r-- 1 dcook wranalab  576 May  1 09:57 mocaseq_ref.sh

[dcook@galenus tmp]$ cd temp
[dcook@galenus temp]$ ls -shlt
total 251M
3.6M -rw-r--r-- 1 dcook wranalab 3.6M May  2 01:31 MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam.bai
121M -rw-r--r-- 1 dcook wranalab 121M May  2 01:31 MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam
3.6M -rw-r--r-- 1 dcook wranalab 3.6M May  2 01:31 MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam.bai
123M -rw-r--r-- 1 dcook wranalab 123M May  2 01:31 MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam

${working_directory}/ref

[dcook@galenus GRCm38.p6]$ pwd
/home/wranalab/dcook/software/tmp/ref/GRCm38.p6
[dcook@galenus GRCm38.p6]$ ls
bwa_index
GetReferenceData.txt
GRCm38.AgilentProbeGaps.txt
GRCm38.bammatcher_bash.conf
GRCm38.bammatcher_docker.conf
GRCm38.canonical_chromosomes.bed.gz
GRCm38.canonical_chromosomes.bed.gz.tbi
GRCm38.Census_allMon_Jan_15_11_46_18_2018_mouse.tsv
GRCm38.Genecode_M20_Exons.rds
GRCm38.Genecode_M20_Genes.rds
GRCm38.p6.dict
GRCm38.p6.fna
GRCm38.p6.fna.1.ebwt
GRCm38.p6.fna.2.ebwt
GRCm38.p6.fna.3.ebwt
GRCm38.p6.fna.4.ebwt
GRCm38.p6.fna.fai
GRCm38.p6.fna.index
GRCm38.p6.fna.map.bw
GRCm38.p6.fna.rev.1.ebwt
GRCm38.p6.fna.rev.2.ebwt
GRCm38.p6.fna.sizes
GRCm38.p6.gc.10000.wig
GRCm38.p6.gc.1000.wig
GRCm38.p6.gc.20000.wig
GRCm38.p6.gc.50000.wig
GRCm38.p6.map.10000.wig
GRCm38.p6.map.1000.wig
GRCm38.p6.map.20000.wig
GRCm38.p6.map.50000.wig
GRCm38.p6.microsatellites
GRCm38.RefFlat
GRCm38.SureSelect_Mouse_All_Exon_V1.bed
GRCm38.SureSelect_Mouse_All_Exon_V1.bed.list
MGP.v5.snp_and_indels.exclude_wild.chromosomal_sort.vcf.gz
MGP.v5.snp_and_indels.exclude_wild.chromosomal_sort.vcf.gz.tbi
mm10_100kb
mm10_10kb
mm10_20kb
mm10_50kb
Samples.tsv
VEP
NikdAK commented 1 year ago

Okay good, could you provide me the log file in the QC folder please? QC/MoCaSeq_Test.report.txt

dpcook commented 1 year ago

Certainly. See attached.

MoCaSeq_Test.report.txt

Seems like the errors start at:

[May 2, 2023 5:31:33 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2062024704
***********************************************************************

A USER ERROR has occurred: Couldn't read file file:///var/pipeline/ref/GRCm38.p6/MGP.v5.snp_and_indels.exclude_wild.vcf.gz. Error was: It doesn't exist.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
05:31:33.561 INFO  BaseRecalibrator - Shutting down engine
[May 2, 2023 5:31:33 AM UTC] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2128609280
***********************************************************************

A USER ERROR has occurred: Couldn't read file file:///var/pipeline/ref/GRCm38.p6/MGP.v5.snp_and_indels.exclude_wild.vcf.gz. Error was: It doesn't exist.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
---- Quality control I (Sequencing artifacts, multiple metrics) ----
Tue May  2 01:31:33 EDT 2023     timestamp: 1683005493
[E::hts_open_format] Failed to open file "MoCaSeq_Test/results/bam/MoCaSeq_Test.Tumor.bam" : No such file or directory
samtools idxstats: failed to open "MoCaSeq_Test/results/bam/MoCaSeq_Test.Tumor.bam": No such file or directory
[E::hts_open_format] Failed to open file "MoCaSeq_Test/results/bam/MoCaSeq_Test.Normal.bam" : No such file or directory
samtools idxstats: failed to open "MoCaSeq_Test/results/bam/MoCaSeq_Test.Normal.bam": No such file or directory

And I notice that MGP.v5.snp_and_indels.exclude_wild.vcf.gz is not in ${working_directory}/ref, but MGP.v5.snp_and_indels.exclude_wild.chromosomal_sort.vcf.gz is

NikdAK commented 1 year ago

Yes so I found the issue. Apparently SANGER changed the FTP server and now our download links are dead.

So we will need to push a new version, but for you it would now be faster to manually change a script and mount your local repository to your container (overwriting the old one within the container). I changed the link and cheated the folder name into the old format. Attached you can find the new version, you need to replace it with the one in your repo folder, also rename the .txt to .sh

#wget -nv -c -r -P $temp_dir ftp://ftp-mouse.sanger.ac.uk//REL-1505-SNPs_Indels/strain_specific_vcfs/
wget -nv -c -r -P $temp_dir ftp://ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-1505-SNPs_Indels/strain_specific_vcfs/
mv $temp_dir/ftp.ebi.ac.uk/pub/databases/mousegenomes/ $temp_dir/ftp-mouse.sanger.ac.uk

Mounting (double check your paths):

script_directory=${working_directory}/pipeline/repository
-B ${script_directory}:/opt/MoCaSeq/repository

Preparation_GenerateSangerMouseDB.txt

After that delete/rename the current reference folder (else it will not trigger the reference script) and rerun the pipeline.

dpcook commented 1 year ago

Still getting same failure--something strange going on with the Sanger files. Log: MoCaSeq_Test.report.txt

***********************************************************************

A USER ERROR has occurred: Couldn't read file file:///var/pipeline/ref/GRCm38.p6/MGP.v5.snp_and_indels.exclude_wild.vcf.gz. Error was: It doesn't exist.

***********************************************************************

Any idea why it would write those Sanger files in a different temp directory than where the BAMs are going??

Binding (note: same result either binding a local repo with the modified Preparation_GenerateSangerMouseDB.sh or re-building the singularity image with the modified file in /opt/MoCaSeq):

singularity run -B ${working_directory}:/var/pipeline/ \
-B ${working_directory}/temp:/var/pipeline/temp \
-B ${working_directory}/ref/:/var/pipeline/ref mocaseq.sif \
    --test yes

BAMs end up in ${working_directory}/temp

[dcook@galenus ref]$ cd ${working_directory}/temp
[dcook@galenus temp]$ ls -shlt
total 251M
3.6M -rw-r--r-- 1 dcook wranalab 3.6M May  5 05:43 MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam.bai
123M -rw-r--r-- 1 dcook wranalab 123M May  5 05:43 MoCaSeq_Test.Tumor.cleaned.sorted.readgroups.marked.bam
3.6M -rw-r--r-- 1 dcook wranalab 3.6M May  5 05:43 MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam.bai
121M -rw-r--r-- 1 dcook wranalab 121M May  5 05:43 MoCaSeq_Test.Normal.cleaned.sorted.readgroups.marked.bam

But with the new download script (this didn't happen before?) makes a new directory in ${working_directory} named temp with some additional characters

total 4.1G
 28K -rw-r--r-- 1 dcook wranalab  27K May  5 05:46  slurm-2094626.out
4.0K drwxr-sr-x 2 dcook wranalab 4.0K May  5 05:43  temp
4.0K drwxr-sr-x 5 dcook wranalab 4.0K May  4 17:09 'temp'$'\r\r'
4.0K drwxr-sr-x 3 dcook wranalab 4.0K May  4 14:13  ref
4.0K drwxr-sr-x 5 dcook wranalab 4.0K May  4 14:13  MoCaSeq_Test
4.0K -rw-r--r-- 1 dcook wranalab  363 May  4 14:13  mocaseq_test.sh
4.1G -rwxr-xr-x 1 dcook wranalab 4.1G May  4 14:12  mocaseq.sif

[dcook@galenus mouse_cna]$ cd 'temp'$'\r\r' && ls
 ftp.ebi.ac.uk   ftp-mouse.sanger.ac.uk  'ftp-mouse.sanger.ac.uk'$'\r'

]$ tree
.
├── ftp.ebi.ac.uk
│   └── pub
│       └── databases
├── ftp-mouse.sanger.ac.uk
│   └── REL-1505-SNPs_Indels
│       └── wild_only
│           └── \015
└── ftp-mouse.sanger.ac.uk\015
    └── REL-1505-SNPs_Indels
        └── strain_specific_vcfs

10 directories, 0 files

And I suspect the directories shouldn't be empty?

Fwiw, I have to mkdir ${working_directory}/temp before running the image for Singularity to accept the bind, so an empty directory is there prior to the run. Not sure if that's problematic or expected.

NikdAK commented 1 year ago

So, I can see "\015" and "\r" in your paths and variables. This is a carriage return character, i.e. unprintable character. Could it be that by modifying the file on e.g. Window/Mac, you introduced some kind of line break or formatting character? This is why he is also creating a new temp directory, fixing this could be the solution.

If that does not work I will upload that reference file and provide you with the link until we fix this FTP issue. Just tell me here if it worked so this issue could be closed, else I will sent you the link (I already have your mail).

dpcook commented 1 year ago

Ah okay, was wondering about the characters. I took the .txt file from your message above and just changed the file extension. I'm trying again now manually making the change directly in the .sh file. Submitting the job now and will monitor as it builds the reference. Will let you know if it works.

dpcook commented 1 year ago

Just to close the loop on the original topic of the issue, I can confirm that, assuming the modified links to Sanger are made/bound, the image seems to run fine with Singularity. Feel free to close if that fix is tracked/made elsewhere.

cd ${working_directory}
singularity run -B ${working_directory}:/var/pipeline/ -B ${working_directory}/temp:/var/pipeline/temp -B ${working_directory}/ref/:/var/pipeline/ref mocaseq.sif \
    --test yes

The --test run is throwing a few errors in the report due to python modules, but they seem unrelated to the image itself, so I'm going to open them as a separate issue to keep this easier to search.