Closed virenar closed 4 years ago
Hi Virenar,
Thank you for your interest in Bamgineer. I haven't tried Bamgineer on macOS. My first guess is that the "sed" and "awk" commands behave slightly differently on MacOS vs other unix based platforms (see for instance: https://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed).
If that happens to be the issue, you should modify sed (and possibly awk) commands in "utils.py" and "methods.py" . You could also take a look at the "/logs/debug.log" file for more info. Hope it helps.
Soroush
Hi Virenar,
I have made changes to the code so that it uses pandas instead of platform dependent commands(sed/awk) .Please pull the latest version, give it a try and let me know if the error persists. Hope it helps.
Soroush
Thanks Soroush for following up. I tried with the latest version but now I am getting a different error.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/anaconda3/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/vamin/projects/CNV/bamgineer2/bamgineer/src/helpers/handlers.py", line 76, in receive
record = self.queue.get(True, self.polltime)
File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get
res = self._recv()
TypeError: __init__() takes exactly 2 arguments (1 given)
full output
beagle.09Nov15.d2a.jar
Copyright (C) 2014-2015 Brian L. Browning
Enter "java -jar beagle.jar" for a summary of command line arguments.
Start time: 01:27 PM EDT on 23 Apr 2018
Command line: java -Xmx3641m -jar beagle.jar
gt=normal_het.vcf.gz
out=outputs/phasedvcfdir/normal_het_phased
No genetic map is specified: using 1 cM = 1 Mb
reference samples: 0
target samples: 1
Window 1 [ chr21:9414112-48119669 ]
target markers: 10225
Starting burn-in iterations
Window=1 Iteration=1
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=2
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=3
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=4
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=5
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=6
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=7
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=8
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=9
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Window=1 Iteration=10
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
Starting phasing iterations
Window=1 Iteration=11
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
states/marker: 1.0
Window=1 Iteration=12
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
states/marker: 1.0
Window=1 Iteration=13
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
states/marker: 1.0
Window=1 Iteration=14
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
states/marker: 1.0
Window=1 Iteration=15
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 2.000 mean count/edge: 1
states/marker: 1.0
Window 2 [ chr22:16066867-51239065 ]
target markers: 3455
Starting burn-in iterations
Window=2 Iteration=1
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=2
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=3
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=4
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=5
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=6
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=7
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=8
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=9
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Window=2 Iteration=10
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
Starting phasing iterations
Window=2 Iteration=11
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
states/marker: 1.0
Window=2 Iteration=12
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
states/marker: 1.0
Window=2 Iteration=13
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
states/marker: 1.0
Window=2 Iteration=14
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
states/marker: 1.0
Window=2 Iteration=15
Time for building model: 0 seconds
Time for sampling (singles): 0 seconds
DAG statistics
mean edges/level: 2 max edges/level: 2
mean edges/node: 1.999 mean count/edge: 1
states/marker: 1.0
Number of markers: 13680
Total time for building model: 0 seconds
Total time for sampling: 1 second
Total run time: 1 second
End time: 01:29 PM EDT on 23 Apr 2018
beagle.09Nov15.d2a.jar finished
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf outputs/phasedvcfdir/hap1_het.vcf
--thin 50
--out outputs/phasedvcfdir/hap1_het_filtered
--recode
After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5324 out of a possible 6773 Sites
Run Time = 0.00 seconds
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf outputs/phasedvcfdir/hap2_het.vcf
--thin 50
--out outputs/phasedvcfdir/hap2_het_filtered
--recode
After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5424 out of a possible 6905 Sites
Run Time = 0.00 seconds
___ generating phased Bed ___
___ filtering bed file columns for ___ gain_tmp.bed
___ filtering bed file columns for ___ gain_tmp.bed
Exception in thread Thread-1:
Traceback (most recent call last):
File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/anaconda3/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/vamin/projects/CNV/bamgineer2/bamgineer/src/helpers/handlers.py", line 76, in receive
record = self.queue.get(True, self.polltime)
File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get
res = self._recv()
TypeError: __init__() takes exactly 2 arguments (1 given)
Can you check /log/debug.log file please? Also could you send the script you use to run the code?
To test, I copied run_example1.sh
to bamgineer/examples/inputs
and ran the bash script.
Thanks Virenar, So is your outputs directory empty after it fails? Do you see new folders (such as haplotypedir, logs, tmpbams, etc) created in the output? If yes, could you also attach your "/logs/debug.log" file?
I think I found your "debug.log" file. It's generated in your inputs. I'll have a look and get back to you.
Looks like it's a problem with "multiprocessing" module version. Seems like it's a rather common issue (see for instance: https://github.com/spotify/luigi/issues/1227). Could you double check the version of your multiprocessing module. I checked mine and it's 70.4 (I suspect yours is higher). You can revert to the old version and try again.
I created docker file for bamgineer with all dependencies shown in readme file, but still having issue running the example. See the error message below. Could you tweak the docker file to give us a stable working environment to run bamgineer?
Error message
Traceback (most recent call last):
File "../../src/simulate.py", line 69, in <module>
main(args)
File "../../src/simulate.py", line 39, in main
run_pipeline(results_path)
File "/bamgineer/src/methods.py", line 575, in run_pipeline
initialize0(phase_path, cancer_dir_path)
File "/bamgineer/src/methods.py", line 45, in initialize0
getVCFHaplotypes(phasedvcf, hap1vcf, hap2vcf)
File "/bamgineer/src/utils.py", line 107, in getVCFHaplotypes
vcfh = gzip.GzipFile(phasedvcf, 'rb')
File "/usr/lib/python2.7/gzip.py", line 94, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '../outputs/phasedvcfdir/normal_het_phased.vcf.gz'
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/bamgineer/src/helpers/handlers.py", line 76, in receive
record = self.queue.get(True, self.polltime)
File "/usr/lib/python2.7/multiprocessing/queues.py", line 135, in get
res = self._recv()
EOFError
Instructions for setting up the bamgineer docker image -
docker build -t <docker username>/bamgineer:latest .
docker run -it -v /path/to/reference/genome/hg19:/tmp/data <docker username>/bamgineer:latest
#mount the hg19 directory containing hg19 reference genome into /tmp/data
bamgineer/examples/inputs/config.cfg
./run_example1.sh
in the bamgineer/examples/scripts/
Hope this helps. If you could update the dockerfile, that would recreate the environment that you have been using would be fantastic.
Looking foward to hearing from you.
Viren
Hi Viren,
We're currently updating the tool to address many of these issues and will aim to deploy the latest stable version in a week or two. We'll update the dockerfile and aim to have it deployed and tested in other environments by that time as well.
Rene
Thanks Rene. Let me know when you have updated the dockerfile.
Viren
Hi Rene,
Did you had any chance in updating the dockerfile?
Viren
Hey Viren,
Sorry for the late reply, we have been benchmarking the new algorithm for the last couple of weeks; your original error was regarding this line (line 17) in utils.py:
"from pathos.multiprocessing import ProcessingPool"
If you comment this line out it should run but we have since improved the algorithm with the ability to specify your desired allelic ratio in the cnv.bed (for e.g. AAB, AABB, AAABB, etc.) as well as the ability to re-pair low coverage bams. Bamgineer was originally designed and tested for high coverage bams.
Bamgineer V2 will be deployed to the master branch by this weekend or sometime next week at the latest. I will gladly be your point of contact regarding debugging. We will have to build the docker image locally since we are unable to use docker on our cluster for security reasons but I will also update and organize the documentation to the exact tools and package versions needed.
Suluxan
Thank you for following up on this. We can look into V2. If you cant provide docker then cant you provide anaconda env file to re-create the environment? Looking forward to trying out the V2.
On Fri, Oct 5, 2018 at 2:44 PM Suluxan Mohanraj notifications@github.com wrote:
Hey Viren,
Sorry for the late reply, we have been benchmarking the new algorithm for the last couple of weeks; your original error was regarding this line (line 17) in utils.py:
"from pathos.multiprocessing import ProcessingPool"
If you comment this line out it should run but we have since improved the algorithm with the ability to specify your desired allelic ratio in the cnv.bed (for e.g. AAB, AABB, AAABB, etc.) as well as the ability to re-pair low coverage bams. Bamgineer was originally designed and tested for high coverage bams.
Bamgineer V2 will be deployed to the master branch by this weekend or sometime next week at the latest. I will gladly be your point of contact regarding debugging. We will have to build the docker image locally since we are unable to use docker on our cluster for security reasons but I will also update and organize the documentation to the exact tools and package versions needed.
Suluxan
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pughlab/bamgineer/issues/8#issuecomment-427462992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEBIr5eb-afrtbw8iMW6HZtQFIGL9t3vks5uh6ikgaJpZM4TW9Xi .
Hey Viren,
I forgot to update you, but I put up a dockerfile as well as a docker-examples folder that you can run bamgineer straight from (I've updated the config file to work within the docker image). I will be testing the docker this week to make sure there are no discrepancies but hopefully it will be useful for you to be up and running quickly. Just follow steps 1-3 in the input prep found under /docs.
Suluxan
Hi,
I am trying to run example data of bamgineer. But everytime, i am getting the following error:
beagle.09Nov15.d2a.jar finished
___ generating phased bed ___
Usage: sambamba-merge [options] <output.bam> <input1.bam> <input2.bam> [...]
Options: -t, --nthreads=NTHREADS
number of threads to use for compression/decompression
-l, --compression-level=COMPRESSION_LEVEL
level of compression for merged BAM file, number from 0 to 9
-H, --header
output merged header to stdout in SAM format, other options are ignored; mainly for debug purposes
-p, --show-progress
show progress bar in STDERR
-F, --filter=FILTER
keep only reads that satisfy FILTER
___ removing merged duplicates near breakpoints ___
sambamba-markdup: Cannot open or create file '/home/vivekr/bamgineer/examples/inputs/CN1.bam' : No such file or directory
Traceback (most recent call last):
File "/home/vivekr/bamgineer/src/simulate.py", line 71, in <module>
main(args)
File "/home/vivekr/bamgineer/src/simulate.py", line 45, in main
run_pipeline(results_path)
File "/home/vivekr/bamgineer/src/methods.py", line 1021, in run_pipeline
merge_final(outbamfn, finalbams_path)
File "/home/vivekr/bamgineer/src/utils.py", line 332, in merge_final
os.remove(mergefn)
OSError: [Errno 2] No such file or directory: '/home/vivekr/bamgineer/examples/inputs/CN1.bam'
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/vivekr/bamgineer/src/helpers/handlers.py", line 76, in receive
record = self.queue.get(True, self.polltime)
File "/home/vivekr/.local/lib/python2.7/site-packages/multiprocess/queues.py", line 138, in get
res = self._recv()
EOFError
I though it may be due to multiprocessing
tool in python so I reinstall it but still getting the same error. Can you please suggest why sambamba
is not able to generate the output .bam
file.
I have attached the complete output log along with error.
error_log.txt
Thanks.
The multiprocessing package has been deprecated - it is now multiprocess/0.70.7 (pip install multiprocess==0.70.7). I believe your sambamba executable file has no executable permissions (chmod).
Also the dependency issues are solved using the docker image or pulling the bamgineer docker image from dock hub into singularity if you're on HPC. Image is here: https://cloud.docker.com/u/suluxan/repository/docker/suluxan/bamgineer and you can use: docker pull suluxan/bamgineer:initial or singularity build bamgineer.simg docker://suluxan/bamgineer:initial
Hi,
I tried docker as you suggested. But I am getting the following error:
Command :
docker pull suluxan/bamgineer
Error :
Using default tag: latest
Error response from daemon: manifest for suluxan/bamgineer:latest not found
It's the tag docker pull suluxan/bamgineer:initial
Hi,
I tried to run example code through docker also but failed again. Here is the way I tried :
docker pull suluxan/bamgineer:initial
Aftter successful pulling of docker, I run it as
docker run -it suluxan/bamgineer:initial
After that
cd bamgineer
cd cd docker-example/scripts
bash run1.sh
And after some time I got the following error
[E::hts_open] fail to open file '/bamgineer/docker-example/outputs/LUAC1/tmpbams/chr21_roigainAAB30227447.bam'
I have attached the complete error log. Have you tested the docker?? bamgineer_error_log.txt
Well it has failed to open the tmp file because there is no bam in the docker image... It would be too inflated to include the bam file.
You would still need to follow steps 1 -3 in bamgineer/docs/input_preparation. This involves creating a splitbams directory and downloading and sorting the bam file (alternatively you use the docker cp commands to move the bam you were using into the container, move filtered vcf as well if this is a different file from the PGP example)
The other steps are done, vcf/bed files are ready to go, and config file is already mapped to the docker container's directories.
Also on a side note: the previous version of bamgineer did include phasing but the beagle tool needs a population file in order to properly phase. It is recommended you start with a phased VCF for accurate simulations. If the haplotypes are phased, VCF preparation is not needed (steps 6 - 9) and also omit the -phase flag from the run.sh script).
Ok. I'll look into it. I think that input_preparations steps should also be automated in run1.sh script which will save a lot of time to just reproduce the example data and see how it works.
I installed bamgineer and all its dependencies in macos (v10.13.4). I get following error when i ran the example-
Here is the full output