Closed DiDeoxy closed 4 years ago
Are you able to try the latest develop branch version? Note that some options have change so your command will become
octopus \
-R Homo_sapiens_assembly38.fasta \
-I 0_NA12878.bam 1_NA12878.bam \
-c 0_NA12878.bcf 1_NA12878.bcf \
-o cohort.bcf \
--disable-denovo-variant-discovery \
--sequence-error-model PCR-FREE.NOVASEQ \
--threads 10 \
--debug
I've removed the --forest
option since the v0.6.3 forest is not compatible with the develop branch version.
I will give it a try
I can't download the forests for the develop branch when I build. So I can't run the develop version of Octopus using docker and nextflow.
How are you building? The forest availability shouldn't affect building. I realised after my first response that the forest you have won't be compatible with the develop branch version (see edited post), but you can still run your command without the forest to see if the bug you got previously has been resolved already.
Okay, will do so.
Build command:
FROM maxh/base_conda:0.1
RUN sudo apt-get update \
&& sudo apt-get install -y build-essential
RUN git clone -b develop https://github.com/luntergroup/octopus.git \
&& octopus/scripts/install.py \
--install-dependencies \
--download-forests
ENV PATH=/home/$USERNAME/octopus/bin:$PATH
Ah ok, so just remove the --download-forests
option from the install.py
command and it should work.
It builds, I just wanted to run with the forest. It'll take about an hour to reprocess the data with the changed command. I will let you know what happens.
Hey, so, first of all, it Octopus takes 5x as long to run and reads 10x as much data on the develop branch without RF compared to 0.6.3-beta with RF. The joint genotype stage for two samples is still running (no crash so far) after 13hrs.
Commands:
octopus \
-R Homo_sapiens_assembly38.fasta \
-I 1_NA12878.bam \
-t octopus_intervals \
-o 1_NA12878.bcf \
--sequence-error-model PCR-FREE.NOVASEQ \
--threads 8
octopus \
-R Homo_sapiens_assembly38.fasta \
-I 0_NA12878.bam \
-t octopus_intervals \
-o 1_NA12878.bcf \
--sequence-error-model PCR-FREE.NOVASEQ \
--threads 8
octopus \
-R Homo_sapiens_assembly38.fasta \
-I 1_NA12878.bam 0_NA12878.bam \
-c 1_NA12878.bcf 0_NA12878.bcf \
-o cohort.bcf \
--disable-denovo-variant-discovery \
--sequence-error-model PCR-FREE.NOVASEQ \
--threads 8 \
--debug
Correction, nextflow was lying to me, the join genotyping stage crashed after 2hrs? with the following error message:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[2019-12-12 03:12:32] <INFO> ------------------------------------------------------------------------
[2019-12-12 03:12:32] <INFO> octopus v0.7.0 (develop 0aa9e574)
[2019-12-12 03:12:32] <INFO> Copyright (c) 2015-2019 University of Oxford
[2019-12-12 03:12:32] <INFO> ------------------------------------------------------------------------
[2019-12-12 03:14:28] <WARN> The population calling model is still in development. Do not use for production work!
[2019-12-12 03:14:28] <INFO> Done initialising calling components in 1m 55s
[2019-12-12 03:14:28] <INFO> Detected 2 samples: "0_NA12878" "1_NA12878"
[2019-12-12 03:14:28] <INFO> Invoked calling model: population
[2019-12-12 03:14:28] <INFO> Processing 3,217,346,917bp with 8 threads (32 cores detected)
[2019-12-12 03:14:28] <INFO> Writing filtered calls to "/data/nextflow_temp/fa/93dd715cc7d9e1f1bb9cc8479ced11/cohort.bcf"
[2019-12-12 03:14:30] <WARN> Running in parallel mode can make debug log difficult to interpret
[2019-12-12 05:32:13] <INFO> -------------------------------------------------------------------------------------
[2019-12-12 05:32:13] <INFO> current | | time | estimated
[2019-12-12 05:32:13] <INFO> position | completed | taken | ttc
[2019-12-12 05:32:13] <INFO> -------------------------------------------------------------------------------------
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:0-65856340
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:65856340-125179584
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:125179584-200450439
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr1:200450439-248956422
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:0-67460839
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:67460839-136232273
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:136232273-205131746
[2019-12-12 05:32:13] <EROR> Encountered a problem whilst calling chr2:205131746-242193529
Are you able to provide the BAM files you're using?
Possibly, what's the best way to get them to you?
If you give me your email I can give you access to a Dropbox folder. Send me an email to dcooke@well.ox.ac.uk if you don't want to post your email on here.
The files were generated by sampling from a samtools fastq conversion
samtools fastq -1 NA12878_1.fastq -2 NA12878_2.fastq -
of the cram file here: ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/1000G_2504_high_coverage/data/ERR3239334/
I didn't sort them out of genomic order before I sampled them. I am currently trying out sorting the cram and then extracting using samtools sort -n NA12878.final.cram |samtools fastq -1 NA12878_1.fastq -2 NA12878_2.fastq -
and then sampling. I'll let you know how that goes.
Can you upload the sampled BAM files rather than the fastqs? Presumably they shouldn't be large if they're just 1x.
done.
Closing as original problem fixed in develop. Please re-open new issue for the other problem if persists. I'm thinking that it may be a memory issue. I'm trying to reproduce and will open issue if I can.
Describe the bug I am trying to joint genotype two files for a test (they are both ~1x). I get the following output:
Version
Command Command line to run octopus: