pughlab / bamgineer

Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
Apache License 2.0
37 stars 14 forks source link

Example failed #8

Closed virenar closed 4 years ago

virenar commented 6 years ago

I installed bamgineer and all its dependencies in macos (v10.13.4). I get following error when i ran the example-

sed: 1: "outputs/phasedvcfdir/ha ...": invalid command code o
sed: 1: "outputs/phasedvcfdir/ha ...": invalid command code o
awk: syntax error at source line 1
 context is
    ($1 ~ "chr"){print $0 >> $1 >>>  "_exons_in_roigain" <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1
sh: outputs/haplotypedir/gain_tmp.bed: No such file or directory
sh: outputs/haplotypedir/het_snpgain.bed: No such file or directory
Traceback (most recent call last):
  File "../../src/simulate.py", line 69, in <module>
    main(args)
  File "../../src/simulate.py", line 39, in main
    run_pipeline(results_path)
  File "/Users/vamin/projects/CNV/bamgineer/src/methods.py", line 587, in run_pipeline
    initialize_pipeline(phase_path, haplotype_path, cnv_path)
  File "/Users/vamin/projects/CNV/bamgineer/src/methods.py", line 98, in initialize_pipeline
    splitBed(hetsnpbed, '_het_snp' + str(event))
  File "/Users/vamin/projects/CNV/bamgineer/src/utils.py", line 296, in splitBed
    os.chdir(path)
OSError: [Errno 2] No such file or directory: 'outputs/haplotypedir'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/anaconda3/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/vamin/projects/CNV/bamgineer/src/helpers/handlers.py", line 76, in receive
    record = self.queue.get(True, self.polltime)
  File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get
    res = self._recv()
EOFError

Here is the full output

outputs
 ___ phasing vcf file ___ 
beagle.09Nov15.d2a.jar
Copyright (C) 2014-2015 Brian L. Browning
Enter "java -jar beagle.jar" for a summary of command line arguments.
Start time: 01:25 PM EDT on 16 Apr 2018

Command line: java -Xmx3641m -jar beagle.jar
  gt=normal_het.vcf.gz
  out=outputs/phasedvcfdir/normal_het_phased

No genetic map is specified: using 1 cM = 1 Mb

reference samples:       0
target samples:          1

Window 1 [ chr21:9414112-48119669 ]
target markers:      10225

Starting burn-in iterations

Window=1 Iteration=1
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=2
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=3
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=4
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=5
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=6
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=7
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=8
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=9
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=10
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Starting phasing iterations

Window=1 Iteration=11
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=12
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=13
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=14
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=15
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window 2 [ chr22:16066867-51239065 ]
target markers:       3455

Starting burn-in iterations

Window=2 Iteration=1
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=2
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=3
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=4
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=5
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=6
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=7
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=8
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=9
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=10
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Starting phasing iterations

Window=2 Iteration=11
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=12
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=13
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=14
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=15
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Number of markers:               13680
Total time for building model: 1 second
Total time for sampling:       1 second
Total run time:                1 second

End time: 01:25 PM EDT on 16 Apr 2018
beagle.09Nov15.d2a.jar finished

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf outputs/phasedvcfdir/hap1_het.vcf
    --thin 50
    --out outputs/phasedvcfdir/hap1_het_filtered
    --recode

After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5324 out of a possible 6773 Sites
Run Time = 0.00 seconds

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf outputs/phasedvcfdir/hap2_het.vcf
    --thin 50
    --out outputs/phasedvcfdir/hap2_het_filtered
    --recode

After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5424 out of a possible 6905 Sites
Run Time = 0.00 seconds
sed: 1: "outputs/phasedvcfdir/ha ...": invalid command code o
sed: 1: "outputs/phasedvcfdir/ha ...": invalid command code o
awk: syntax error at source line 1
 context is
    ($1 ~ "chr"){print $0 >> $1 >>>  "_exons_in_roigain" <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1
sh: outputs/haplotypedir/gain_tmp.bed: No such file or directory
sh: outputs/haplotypedir/het_snpgain.bed: No such file or directory
Traceback (most recent call last):
  File "../../src/simulate.py", line 69, in <module>
    main(args)
  File "../../src/simulate.py", line 39, in main
    run_pipeline(results_path)
  File "/Users/vamin/projects/CNV/bamgineer/src/methods.py", line 587, in run_pipeline
    initialize_pipeline(phase_path, haplotype_path, cnv_path)
  File "/Users/vamin/projects/CNV/bamgineer/src/methods.py", line 98, in initialize_pipeline
    splitBed(hetsnpbed, '_het_snp' + str(event))
  File "/Users/vamin/projects/CNV/bamgineer/src/utils.py", line 296, in splitBed
    os.chdir(path)
OSError: [Errno 2] No such file or directory: 'outputs/haplotypedir'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/anaconda3/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/vamin/projects/CNV/bamgineer/src/helpers/handlers.py", line 76, in receive
    record = self.queue.get(True, self.polltime)
  File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get
    res = self._recv()
EOFError
soroushsamadian commented 6 years ago

Hi Virenar,

Thank you for your interest in Bamgineer. I haven't tried Bamgineer on macOS. My first guess is that the "sed" and "awk" commands behave slightly differently on MacOS vs other unix based platforms (see for instance: https://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed).

If that happens to be the issue, you should modify sed (and possibly awk) commands in "utils.py" and "methods.py" . You could also take a look at the "/logs/debug.log" file for more info. Hope it helps.

Soroush

soroushsamadian commented 6 years ago

Hi Virenar,

I have made changes to the code so that it uses pandas instead of platform dependent commands(sed/awk) .Please pull the latest version, give it a try and let me know if the error persists. Hope it helps.

Soroush

virenar commented 6 years ago

Thanks Soroush for following up. I tried with the latest version but now I am getting a different error.

Exception in thread Thread-1:                                                                                                           
Traceback (most recent call last):                                                                                                      
  File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner                                                          
    self.run()                                                                                                                          
  File "/anaconda3/lib/python2.7/threading.py", line 754, in run                                                                        
    self.__target(*self.__args, **self.__kwargs)                                                                                        
  File "/Users/vamin/projects/CNV/bamgineer2/bamgineer/src/helpers/handlers.py", line 76, in receive                                    
    record = self.queue.get(True, self.polltime)                                                                                        
  File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get                                                           
    res = self._recv()                                                                                                                  
TypeError: __init__() takes exactly 2 arguments (1 given) 

full output

beagle.09Nov15.d2a.jar
Copyright (C) 2014-2015 Brian L. Browning
Enter "java -jar beagle.jar" for a summary of command line arguments.
Start time: 01:27 PM EDT on 23 Apr 2018

Command line: java -Xmx3641m -jar beagle.jar
  gt=normal_het.vcf.gz
  out=outputs/phasedvcfdir/normal_het_phased

No genetic map is specified: using 1 cM = 1 Mb

reference samples:       0
target samples:          1

Window 1 [ chr21:9414112-48119669 ]
target markers:      10225

Starting burn-in iterations

Window=1 Iteration=1
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=2
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=3
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=4
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=5
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=6
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=7
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=8
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=9
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Window=1 Iteration=10
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

Starting phasing iterations

Window=1 Iteration=11
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=12
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=13
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=14
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window=1 Iteration=15
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  2.000  mean count/edge: 1

states/marker:    1.0

Window 2 [ chr22:16066867-51239065 ]
target markers:       3455

Starting burn-in iterations

Window=2 Iteration=1
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=2
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=3
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=4
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=5
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=6
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=7
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=8
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=9
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Window=2 Iteration=10
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

Starting phasing iterations

Window=2 Iteration=11
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=12
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=13
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=14
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Window=2 Iteration=15
Time for building model:         0 seconds
Time for sampling (singles):     0 seconds
DAG statistics
mean edges/level: 2      max edges/level: 2
mean edges/node:  1.999  mean count/edge: 1

states/marker:    1.0

Number of markers:               13680
Total time for building model: 0 seconds
Total time for sampling:       1 second
Total run time:                1 second

End time: 01:29 PM EDT on 23 Apr 2018
beagle.09Nov15.d2a.jar finished

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf outputs/phasedvcfdir/hap1_het.vcf
        --thin 50
        --out outputs/phasedvcfdir/hap1_het_filtered
        --recode

After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5324 out of a possible 6773 Sites
Run Time = 0.00 seconds

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf outputs/phasedvcfdir/hap2_het.vcf
        --thin 50
        --out outputs/phasedvcfdir/hap2_het_filtered
        --recode

After filtering, kept 0 out of 0 Individuals
Outputting VCF file...
After filtering, kept 5424 out of a possible 6905 Sites
Run Time = 0.00 seconds
 ___ generating phased Bed ___ 
 ___ filtering bed file columns for ___ gain_tmp.bed
 ___ filtering bed file columns for ___ gain_tmp.bed
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/anaconda3/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/anaconda3/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/vamin/projects/CNV/bamgineer2/bamgineer/src/helpers/handlers.py", line 76, in receive
    record = self.queue.get(True, self.polltime)
  File "/anaconda3/lib/python2.7/multiprocessing/queues.py", line 135, in get
    res = self._recv()
TypeError: __init__() takes exactly 2 arguments (1 given)
soroushsamadian commented 6 years ago

Can you check /log/debug.log file please? Also could you send the script you use to run the code?

virenar commented 6 years ago

To test, I copied run_example1.sh to bamgineer/examples/inputs and ran the bash script.

bamgineer.zip

soroushsamadian commented 6 years ago

Thanks Virenar, So is your outputs directory empty after it fails? Do you see new folders (such as haplotypedir, logs, tmpbams, etc) created in the output? If yes, could you also attach your "/logs/debug.log" file?

I think I found your "debug.log" file. It's generated in your inputs. I'll have a look and get back to you.

soroushsamadian commented 6 years ago

Looks like it's a problem with "multiprocessing" module version. Seems like it's a rather common issue (see for instance: https://github.com/spotify/luigi/issues/1227). Could you double check the version of your multiprocessing module. I checked mine and it's 70.4 (I suspect yours is higher). You can revert to the old version and try again.

virenar commented 6 years ago

I created docker file for bamgineer with all dependencies shown in readme file, but still having issue running the example. See the error message below. Could you tweak the docker file to give us a stable working environment to run bamgineer?

Error message

Traceback (most recent call last):
  File "../../src/simulate.py", line 69, in <module>
    main(args)
  File "../../src/simulate.py", line 39, in main
    run_pipeline(results_path)
  File "/bamgineer/src/methods.py", line 575, in run_pipeline
    initialize0(phase_path, cancer_dir_path)
  File "/bamgineer/src/methods.py", line 45, in initialize0
    getVCFHaplotypes(phasedvcf, hap1vcf, hap2vcf)
  File "/bamgineer/src/utils.py", line 107, in getVCFHaplotypes
    vcfh = gzip.GzipFile(phasedvcf, 'rb')
  File "/usr/lib/python2.7/gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '../outputs/phasedvcfdir/normal_het_phased.vcf.gz'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/bamgineer/src/helpers/handlers.py", line 76, in receive
    record = self.queue.get(True, self.polltime)
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 135, in get
    res = self._recv()
EOFError

Instructions for setting up the bamgineer docker image -

  1. Install Docker
  2. Build image docker build -t <docker username>/bamgineer:latest .
  3. docker run -it -v /path/to/reference/genome/hg19:/tmp/data <docker username>/bamgineer:latest #mount the hg19 directory containing hg19 reference genome into /tmp/data
  4. Change necessary configuration in bamgineer/examples/inputs/config.cfg
  5. Run ./run_example1.sh in the bamgineer/examples/scripts/

Hope this helps. If you could update the dockerfile, that would recreate the environment that you have been using would be fantastic.

Looking foward to hearing from you.

Viren

quevedor2 commented 6 years ago

Hi Viren,

We're currently updating the tool to address many of these issues and will aim to deploy the latest stable version in a week or two. We'll update the dockerfile and aim to have it deployed and tested in other environments by that time as well.

Rene

virenar commented 6 years ago

Thanks Rene. Let me know when you have updated the dockerfile.

Viren

virenar commented 6 years ago

Hi Rene,

Did you had any chance in updating the dockerfile?

Viren

suluxan commented 6 years ago

Hey Viren,

Sorry for the late reply, we have been benchmarking the new algorithm for the last couple of weeks; your original error was regarding this line (line 17) in utils.py:

"from pathos.multiprocessing import ProcessingPool"

If you comment this line out it should run but we have since improved the algorithm with the ability to specify your desired allelic ratio in the cnv.bed (for e.g. AAB, AABB, AAABB, etc.) as well as the ability to re-pair low coverage bams. Bamgineer was originally designed and tested for high coverage bams.

Bamgineer V2 will be deployed to the master branch by this weekend or sometime next week at the latest. I will gladly be your point of contact regarding debugging. We will have to build the docker image locally since we are unable to use docker on our cluster for security reasons but I will also update and organize the documentation to the exact tools and package versions needed.

Suluxan

virenar commented 6 years ago

Thank you for following up on this. We can look into V2. If you cant provide docker then cant you provide anaconda env file to re-create the environment? Looking forward to trying out the V2.

On Fri, Oct 5, 2018 at 2:44 PM Suluxan Mohanraj notifications@github.com wrote:

Hey Viren,

Sorry for the late reply, we have been benchmarking the new algorithm for the last couple of weeks; your original error was regarding this line (line 17) in utils.py:

"from pathos.multiprocessing import ProcessingPool"

If you comment this line out it should run but we have since improved the algorithm with the ability to specify your desired allelic ratio in the cnv.bed (for e.g. AAB, AABB, AAABB, etc.) as well as the ability to re-pair low coverage bams. Bamgineer was originally designed and tested for high coverage bams.

Bamgineer V2 will be deployed to the master branch by this weekend or sometime next week at the latest. I will gladly be your point of contact regarding debugging. We will have to build the docker image locally since we are unable to use docker on our cluster for security reasons but I will also update and organize the documentation to the exact tools and package versions needed.

Suluxan

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pughlab/bamgineer/issues/8#issuecomment-427462992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEBIr5eb-afrtbw8iMW6HZtQFIGL9t3vks5uh6ikgaJpZM4TW9Xi .

suluxan commented 6 years ago

Hey Viren,

I forgot to update you, but I put up a dockerfile as well as a docker-examples folder that you can run bamgineer straight from (I've updated the config file to work within the docker image). I will be testing the docker this week to make sure there are no discrepancies but hopefully it will be useful for you to be up and running quickly. Just follow steps 1-3 in the input prep found under /docs.

Suluxan

vivekruhela commented 5 years ago

Hi,

I am trying to run example data of bamgineer. But everytime, i am getting the following error:

beagle.09Nov15.d2a.jar finished ___ generating phased bed ___ Usage: sambamba-merge [options] <output.bam> <input1.bam> <input2.bam> [...]

Options: -t, --nthreads=NTHREADS number of threads to use for compression/decompression -l, --compression-level=COMPRESSION_LEVEL level of compression for merged BAM file, number from 0 to 9 -H, --header output merged header to stdout in SAM format, other options are ignored; mainly for debug purposes -p, --show-progress show progress bar in STDERR -F, --filter=FILTER keep only reads that satisfy FILTER ___ removing merged duplicates near breakpoints ___ sambamba-markdup: Cannot open or create file '/home/vivekr/bamgineer/examples/inputs/CN1.bam' : No such file or directory Traceback (most recent call last): File "/home/vivekr/bamgineer/src/simulate.py", line 71, in <module> main(args) File "/home/vivekr/bamgineer/src/simulate.py", line 45, in main run_pipeline(results_path) File "/home/vivekr/bamgineer/src/methods.py", line 1021, in run_pipeline merge_final(outbamfn, finalbams_path) File "/home/vivekr/bamgineer/src/utils.py", line 332, in merge_final os.remove(mergefn) OSError: [Errno 2] No such file or directory: '/home/vivekr/bamgineer/examples/inputs/CN1.bam' Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/vivekr/bamgineer/src/helpers/handlers.py", line 76, in receive record = self.queue.get(True, self.polltime) File "/home/vivekr/.local/lib/python2.7/site-packages/multiprocess/queues.py", line 138, in get res = self._recv() EOFError

I though it may be due to multiprocessing tool in python so I reinstall it but still getting the same error. Can you please suggest why sambamba is not able to generate the output .bam file. I have attached the complete output log along with error. error_log.txt

Thanks.

suluxan commented 5 years ago

The multiprocessing package has been deprecated - it is now multiprocess/0.70.7 (pip install multiprocess==0.70.7). I believe your sambamba executable file has no executable permissions (chmod).

Also the dependency issues are solved using the docker image or pulling the bamgineer docker image from dock hub into singularity if you're on HPC. Image is here: https://cloud.docker.com/u/suluxan/repository/docker/suluxan/bamgineer and you can use: docker pull suluxan/bamgineer:initial or singularity build bamgineer.simg docker://suluxan/bamgineer:initial

vivekruhela commented 5 years ago

Hi, I tried docker as you suggested. But I am getting the following error: Command : docker pull suluxan/bamgineer Error : Using default tag: latest Error response from daemon: manifest for suluxan/bamgineer:latest not found

suluxan commented 5 years ago

It's the tag docker pull suluxan/bamgineer:initial

vivekruhela commented 5 years ago

Hi,

I tried to run example code through docker also but failed again. Here is the way I tried :

docker pull suluxan/bamgineer:initial Aftter successful pulling of docker, I run it as docker run -it suluxan/bamgineer:initial

After that cd bamgineer cd cd docker-example/scripts bash run1.sh And after some time I got the following error [E::hts_open] fail to open file '/bamgineer/docker-example/outputs/LUAC1/tmpbams/chr21_roigainAAB30227447.bam'

I have attached the complete error log. Have you tested the docker?? bamgineer_error_log.txt

suluxan commented 5 years ago

Well it has failed to open the tmp file because there is no bam in the docker image... It would be too inflated to include the bam file.

You would still need to follow steps 1 -3 in bamgineer/docs/input_preparation. This involves creating a splitbams directory and downloading and sorting the bam file (alternatively you use the docker cp commands to move the bam you were using into the container, move filtered vcf as well if this is a different file from the PGP example)

The other steps are done, vcf/bed files are ready to go, and config file is already mapped to the docker container's directories.

Also on a side note: the previous version of bamgineer did include phasing but the beagle tool needs a population file in order to properly phase. It is recommended you start with a phased VCF for accurate simulations. If the haplotypes are phased, VCF preparation is not needed (steps 6 - 9) and also omit the -phase flag from the run.sh script).

vivekruhela commented 5 years ago

Ok. I'll look into it. I think that input_preparations steps should also be automated in run1.sh script which will save a lot of time to just reproduce the example data and see how it works.