secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
107 stars 37 forks source link

unsupported operand type(s) for +=: 'int' and 'str' #15

Closed colindaven closed 7 years ago

colindaven commented 7 years ago

Hi Stephane,

trying 0.9.2 for the first time today I came across this new bug

Best wishes, Colin

3. Identifying connected variants...

 calculating sequencing noise level...
 sequencing noise level estimated at 0.003902
 creating read sets...
 generating read connectivity map...
 testing variant connections versus noise...
 25551 variant connections dropped because of conflicting configurations (threshold = 0.010000)
 104943 variants covered by at least 1 read

4. Identifying haplotype blocks...

5. Phasing blocks...

6. Outputting haplotypes...

Traceback (most recent call last): File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 2009, in main(); File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 779, in main phase_support[0] += maf; TypeError: unsupported operand type(s) for +=: 'int' and 'str'

colindaven commented 7 years ago

Perhaps the "maf" variable hasn't been set properly with my input data ?

secastel commented 7 years ago

Hmm, I haven't changed anything related to that lately. Would you mind providing me with the run command you used when this crash happened?

colindaven commented 7 years ago

I don't have access to the run command from home, but it was the standard one I had been using before successfully with all phasers prior to 0.7 - via a script - so nothing new there.

It might be our data has changed of course.

I could circumvent the errors by manually assigning maf=0.01 (we don't have valid maf values anyway) on both line 779 and 2? lines later 781 of phaser.py. I presume it is getting an empty string back and cannot convert from empty string to int.

After the hack phaser ran successfully.

secastel commented 7 years ago

I'm not exactly sure what is causing the problem, it could be something to do with your input VCF, but I think I have put in some code that should prevent this error from occurring. If you have a chance I would really appreciate if you could try running the attached version of phaser with the data that was causing 0.9.3 to crash. Please let me know if it runs without error.

Thanks so much!

phaser_093_test.py.zip

colindaven commented 7 years ago

Hi,

unfortunately I couldn't get phaser 093 to work with the same data on the same machine.

Basically my run command is this (which has worked fine in over > 20 runs prior to 0.9.2 and 0.9.3:

python /home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py --bam $bam --paired_end 1 --vcf $vcf --o $out --sample $sample --threads 1 --mapq 10 --baseq 10 --pass_only 0 --unique_ids 1 --unphased_vars 0 --gw_phase_method 1 --gw_phase_vcf 3 --min_cov 10 --temp_dir $tmp &

`bash runPhaser_TEST093.sh /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/vcfs_oct_olaf/K505cms_Hybrid_coverage_5.vcf.gz K505xK506_vcfK505cms /mnt/scratch/colin/tmp/phaser1b/ bash runPhaser.sh VCF_IN_FULL_PATH OUTPREFIX TMPDIR Enter VCF as arg1, outPrefix as arg2 and tmpDir as arg3 - important, see examples bash runPhaser.sh /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/vcfs_sept_olaf/K505_Hybrid_coverage_30.vcf K505xK506_vcfK505 /mnt/scratch/colin/tmp/phaser1/ bioinformatics@deei-bioinfcloud2:~/NAS01/programs/phaser/phaser_test_nov2016$ Warning message: In local({ : bytecode version mismatch; using eval Error in objects(db.pos, all.names = TRUE) : 3 arguments passed to .Internal(ls) which requires 2

################################################## Welcome to phASER v0.9.3 Author: Stephane Castel (scastel@nygenome.org) ##################################################

1. Loading heterozygous variants into intervals...

 processing VCF...
 creating variant mapping table...
      137152 heterozygous sites being used for phasing (0 filtered, 0 indels excluded, 0 unphased)

2. Retrieving reads that overlap heterozygous sites...

 file: /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/K505xK506/pseudogenomes/star_het1/K505xK506_1BJ2576_allReps_R1_s.bam
      minimum mapq: 10
      mapping reads to variants...

Traceback (most recent call last): File "/home/bioinformatics/NAS01/programs/phaser/phaser_test_nov2016/phaser_093_test.py", line 2008, in main(); File "/home/bioinformatics/NAS01/programs/phaser/phaser_test_nov2016/phaser_093_test.py", line 325, in main result_files = parallelize(call_mapping_script, pool_input); File "/home/bioinformatics/NAS01/programs/phaser/phaser_test_nov2016/phaser_093_test.py", line 1756, in parallelize pool_output.append(function(input)); File "/home/bioinformatics/NAS01/programs/phaser/phaser_test_nov2016/phaser_093_test.py", line 1076, in call_mapping_script raise RuntimeError("subprocess.call of read_variant_map.py exited with an error") RuntimeError: subprocess.call of read_variant_map.py exited with an error`

############## Phaser 0_92 (hacked with my version as commented above) works fine: ###############

Normal run:

`bash runPhaser.sh /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/vcfs_oct_olaf/K505cms_Hybrid_coverage_5.vcf.gz K505xK506_vcfK505cms /mnt/scratch/colin/tmp/phaser1b/ bash runPhaser.sh VCF_IN_FULL_PATH OUTPREFIX TMPDIR Enter VCF as arg1, outPrefix as arg2 and tmpDir as arg3 - important, see examples bash runPhaser.sh /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/vcfs_sept_olaf/K505_Hybrid_coverage_30.vcf K505xK506_vcfK505 /mnt/scratch/colin/tmp/phaser1/ bioinformatics@deei-bioinfcloud2:~/NAS01/programs/phaser/phaser_test_nov2016$ Warning message: In local({ : bytecode version mismatch; using eval Error in objects(db.pos, all.names = TRUE) : 3 arguments passed to .Internal(ls) which requires 2

################################################## Welcome to phASER v0.9.2 Author: Stephane Castel (scastel@nygenome.org) ##################################################

1. Loading heterozygous variants into intervals...

 processing VCF...
 creating variant mapping table...
      137152 heterozygous sites being used for phasing (0 filtered, 0 indels excluded, 0 unphased)

2. Retrieving reads that overlap heterozygous sites...

 file: /home/bioinformatics/NAS10/ZR/ZR2016/ASE_Olaf/july2016/K505xK506/pseudogenomes/star_het1/K505xK506_1BJ2576_allReps_R1_s.bam
      minimum mapq: 10
      mapping reads to variants...`

I also tried this with and without parallel --threads 1 Do 0_92 and 0_93 work with that last test I provided you with ?

Otherwise, it would be helpful to see the new code as a branch in github if you're a fan of that.

Sorry I don't have better news. Colin

secastel commented 7 years ago

Thanks for trying this out. One quick question before I go deeper into this. Does the directory that you ran the 093 script from contain the previous "read_variant_map.py" script? The error that is coming up would suggest that it is not able to call that script. To test out the 093 script I sent just put it in the same directory as the previous "phaser.py" script was located, and run it from there.

secastel commented 7 years ago

Or alternatively you can just copy the 092 "read_variant_map.py" file into what looks like is your phaser test directory "phaser_test_nov2016".

colindaven commented 7 years ago

ah .... sorry, I missed that one, it should have been obvious. I didn't want to overwrite or mix a dev version and main version of the scripts.

Ok, version 0.9.3 works fine now. Thanks.

secastel commented 7 years ago

No problem, thanks for testing! The changes have been pushed to the main branch.