Closed Yiguan closed 3 years ago
Hi, which version of Octopus are you using?
Hi, which version of Octopus are you using?
I think it the latest version:
octopus version 0.6.3-beta
Target: x86_64 Linux 5.4.0-1041-azure
Compiler: GNU 9.3.0
Boost: 1_74
Thanks!
v0.6.3b is actually very old. I'm assuming you're installing via Conda? I'm trying to get the Bioconda version updated to the latest version which has a number of improvements for de novo calling. Hopefully this will be available in the next few days. Otherwise you can try installing manually.
v0.6.3b is actually very old. I'm assuming you're installing via Conda? I'm trying to get the Bioconda version updated to the latest version which has a number of improvements for de novo calling. Hopefully this will be available in the next few days. Otherwise you can try installing manually.
Yes, I installed it via conda
. I will try to install the latest version manually and test it again. Thanks!
v0.6.3b is actually very old. I'm assuming you're installing via Conda? I'm trying to get the Bioconda version updated to the latest version which has a number of improvements for de novo calling. Hopefully this will be available in the next few days. Otherwise you can try installing manually.
Hi, Following your instructions , I tried to install it via github:
$ git clone -b master https://github.com/luntergroup/octopus.git
$ octopus/scripts/install.py --dependencies --forests
but encountered an error
==> ../configure --prefix=/data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4 --libdir=/data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4/lib/gcc/10 --di
==> make
Last 15 lines from /data/home/wwang1111/.cache/Homebrew/Logs/gcc/02.make:
checking whether /usr/bin/make sets $(MAKE)... yes
checking whether /usr/bin/make supports nested variables... yes
checking for x86_64-pc-linux-gnu-gcc... /tmp/gcc-20210407-4531-8dqqpx/gcc-10.2.0/build/./gcc/xgcc -B/tmp/gcc-20210407-4531-8dqqpx/gcc-10.2.0/build/./gcc/ -B/data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4/x86_64-pc-linux-gnu/bin/ -B/data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4/x86_64-pc-linux-gnu/lib/ -isystem /data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4/x86_64-pc-linux-gnu/include -isystem /data/home/wwang1111/octopus/build/brew/Cellar/gcc/10.2.0_4/x86_64-pc-linux-gnu/sys-include -fno-checking
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... configure: error: in `/tmp/gcc-20210407-4531-8dqqpx/gcc-10.2.0/build/x86_64-pc-linux-gnu/libgomp':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
make[2]: *** [configure-stage1-target-libgomp] Error 1
make[2]: Leaving directory `/tmp/gcc-20210407-4531-8dqqpx/gcc-10.2.0/build'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/tmp/gcc-20210407-4531-8dqqpx/gcc-10.2.0/build'
make: *** [all] Error 2
READ THIS: https://docs.brew.sh/Troubleshooting
These open issues may also help:
fibjs: build with gcc https://github.com/Homebrew/linuxbrew-core/pull/22696
emscripten: use newer gcc https://github.com/Homebrew/linuxbrew-core/pull/22455
lc0: use newer GCC https://github.com/Homebrew/linuxbrew-core/pull/22550
hbase: build with system gcc https://github.com/Homebrew/linuxbrew-core/pull/22685
gdb: use system gcc, remove workaround https://github.com/Homebrew/linuxbrew-core/pull/22698
Bootstrapped gcc@5 has broken symlinks https://github.com/Homebrew/linuxbrew-core/issues/22511
gcc@10 fails because of outdated zstd on Ubuntu 16.04 https://github.com/Homebrew/linuxbrew-core/issues/22800
Traceback (most recent call last):
File "octopus/scripts/install.py", line 515, in <module>
main(args)
File "octopus/scripts/install.py", line 422, in main
ret = call(["cmake3"] + cmake_options + [".."])
File "/usr/lib64/python3.6/subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'cmake3'
Is it trying to install gcc_10.2 but failed?
I actually have updated my gcc to the latest version manually.
$ gcc --version
gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
A follow up about the mutation detected using different version of octopus
.
Here is the comparison on the same test data using two different octopus
versions.
For version 0.6.3-beta
installed via conda
, the inferred de novo
mutations are:
CHR POS REF ALT
4 12050 A T
4 25522 T G
4 25907 T C
4 147101 AAAAGGTAAATTTATTATCTGTTCCGAATCC A
4 147133 TATC T
4 147145 GCAATAGATT G
4 147174 T A
4 147217 C T
4 147222 A T
4 637858 G A
4 792408 T C
4 1267507 AT A,*
4 1280572 G A
4 1293153 G A
4 1305360 T A
4 1316559 G A
For version 0.7.2
, the inferred de novo
mutations are:
CHR POS REF ALT
4 12050 A T
4 25522 T G
4 25907 T C
4 225526 T C
4 225532 A T
4 637858 G A
4 786848 A C
4 791262 G T
4 792408 T C
4 1200133 C A
4 1200140 C A
4 1200147 C A
4 1200495 T G
4 1218262 A T
4 1218272 A T
4 1280572 G A
4 1293153 G A
4 1295375 G A
4 1305360 T A
Although there are some consensus sites, the two sets results are quite different. The newer version octopus
seems to report more mutations.
I'd suggest
--snp-heterozygosity
and --indel-heterozygosity
parameters to values appropriate for fruit fly. --denovo-snv-prior
and --denovo-indel-prior
, if these are different substantially from human in fruit fly.--max-haplotypes
.I'd also recommend against using the --denovos-only
option. This prevents you from re-filtering the call set and you loose haplotype information from germline variants. You can easily extract de novo calls from the default output as they are flagged with DENOVO
in the INFO
column:
$ bcftools view -i 'DENOVO=1' trio.vcf > trio.denovo.vcf
Finally, I'd recommend using the --bamout
option to make evidence BAMs. So in summary, try
$ octopus -R my.fa \
-I sample_43.sort.bam sample_32.sort.bam sample_62.sort.bam \
-F sample_43 \
-M sample_32 \
--snp-heterozygosity XXX \
--indel-heterozygosity YYY \
--max-haplotypes 300 \
--regions 4 \
--forest germline.v0.7.2.forest \
--thread 8 \
-o trio.vcf \
--bamout evidence
If you're still finding lots of PASS denovo calls then take a look at the evidence BAMs to see what's going on. I'm also happy to take a look at these if you'd like to post them along with the VCF.
Hi, I re-ran the test with the parameters you suggested as following:
octopus -R chr4.fa \
-I sample_43_chr4.bam sample_32_chr4.bam sample_62_chr4.bam \
-F sample_43 \
-M sample_32 \
--denovo-snv-prior 2.8e-9 \
--denovo-indel-prior 1.2e-9 \
--snp-heterozygosity 0.005 \
--indel-heterozygosity 0.0005 \
--max-haplotypes 300 \
--regions 4 \
--forest germline.v0.7.2.forest.gz \
--thread 8 \
-o trio_V0.7_ref6.36_s62.vcf \
--bamout evidence
But unfortunately, it encountered an error:
...
[2021-04-09 23:16:55] <INFO> 4:1301839 96.3% 54m 45s 2m 7s
[2021-04-09 23:19:56] <INFO> 4:1313994 97.5% 57m 46s 1m 32s
[2021-04-09 23:21:15] <INFO> 4:1333424 98.9% 59m 5s 37s
[2021-04-09 23:21:16] <INFO> - 100% 59m 6s -
[2021-04-09 23:21:16] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-04-09 23:21:16] <INFO> Removed 4 temporary files
[2021-04-09 23:21:16] <EROR> A program error has occurred:
[2021-04-09 23:21:16] <EROR>
[2021-04-09 23:21:16] <EROR> Encountered an exception during calling 'std::bad_alloc'. This means
[2021-04-09 23:21:16] <EROR> there is a bug and your results are untrustworthy.
[2021-04-09 23:21:16] <EROR>
[2021-04-09 23:21:16] <EROR> To help resolve this error run in debug mode and send the log file to
[2021-04-09 23:21:16] <EROR> https://github.com/luntergroup/octopus/issues.
[2021-04-09 23:21:16] <INFO> ------------------------------------------------------------------------
I tested it many times and always got the same error. It's unlikely to be caused by the memory issue since we have ENOUGH available memory on our workstation.
I am not sure if it's a bug in the octopus
or somewhere goes wrong in my Linux environment. I shared the data on GoogleDrive: 3 bam files and a reference genome file.
https://drive.google.com/drive/folders/1dySSlXnKJTotUNndxOkT-Bw1PErpHP9V?usp=sharing
Thanks!
Thanks for the data. Unfortunately I can't replicate the exception on my system - the run completes successfully. What OS are you running? I can try to replicate in a Docker container.
Regarding the de novos, these do appear to be false positives and unfortunately the random forest doesn't filter them. I'm going to include more trio training data in the next release forest so that should improve. For now, there are other reported annotations that strongly point to these being false positives: MP
, PP
, and GQ
. At least one of these is very low in all of the called DENOVO
mutations. For example, using the filter
$ bcftools view -f PASS -i 'DENOVO=1&&MP>10&&PP>10&&MIN(GQ)>10' trio.vcf > trio.denovo.highconf.vcf
results in zero de novo calls. Although I usually don't like hard filtering, these are very relaxed thresholds for these annotations, so it's unlikely to affect sensitivity much.
Thanks so much! I think it's probably essential to do some hard filtering for my data. Reads simulation may be helpful to determine a threshold for the annotation. I will try that.
Also, looking forward your upgrading the octopus
on conda.
The exception is due to the forest files being proved in compressed form - they need to be decompressed.
Hi!
It's an excellent tool! Also thanks for providing such detailed manual. Impressive!
I have some Illunima data from fruit flies and trying to detect the de novo mutations on germline. This is the command I am testing:
There are dozens of variants reported in the
trio_denovosOnly.vcf
file. In fact, thechromosome 4
in fruit fly is only about 1.3Mb, and there are unlikely to have new mutations in one generation on this chromosome. These reported variants are likely to be false positives.My question is which arguments/parameters in
octopus
could be used to minimise the detection of false positives even if at the cost of running time and memory.I currently only use the default parameters and I appreciate it if you could offer some suggestion!
Thanks!