zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

Issue with parenthesis in print, using python 3 #30

Closed sahilseth closed 8 years ago

sahilseth commented 8 years ago

Here is an example error:

 File "utils/classify_WHAM_vcf.py", line 46
    print '##INFO=<ID=WC,Number=1,Type=String,Description="WHAM classifier variant type">'
                                                                                         ^
SyntaxError: Missing parentheses in call to 'print'
zeeev commented 8 years ago

Are you using WHAM for SV discovery? If so just i'd advise you to use WHAM-GRAPHENING -k.

sahilseth commented 8 years ago

I have a tumor normal pair, and would like to explore translocations.

Thanks, Sahil

On Apr 12, 2016, at 4:26 PM, Zev Kronenberg notifications@github.com wrote:

Are you using WHAM for SV discovery? If so just i'd advise you to use WHAM-GRAPHENING -k.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

zeeev commented 8 years ago

That'd be the correct use case for WHAM. Did the classifier run for you after changing the quote?

ejodude commented 8 years ago

@Sahil, Your current set of errors stem from incompatibilities in syntax between python 2X and python 3. I had not tested the code against any versions of python3 unfortunately (just noted that we claim Python3 is supported in the docs, so I apologize for that error).

Do you happen to have a version of python 2.7 installed on your system? You can have multiple different versions of python on your machine and so running an instance of python2.7 is probably your easiest fix. I can also try and port the code over to python 3, but I expect that this will probably take up to a week to update.

My recommendation would be to install an anaconda distribution of python2.7 here: https://www.continuum.io/downloads https://www.continuum.io/downloads ; it will come with all of the packages you need to run the classifier and will not overwrite your default python that is installed on your machine.

-EJ

On Apr 12, 2016, at 3:01 PM, Zev Kronenberg notifications@github.com wrote:

That'd be the correct use case for WHAM. Did the classifier run for you after changing the quote?

— You are receiving this because you were assigned. Reply to this email directly or view it on GitHub https://github.com/zeeev/wham/issues/30#issuecomment-209122204

sahilseth commented 8 years ago

Yes, I just created a new python2 env, and compiled again to be sure. Now it works, but shows a lot of warnings:

python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

I do see the file is getting created, here is the first call:

# wham bam
1       10004   .       N       AACCCCNANCCCNACCCCAACCCCCACCCCAN        .       .       LRT=0;WAF=1,0.500001,0.750001;GC=1,1;AT=1,0.0434783,0.0434783,0,0,0,0,0,0,0,0,0,0.0434783,0,0.475082;CF=0.173913;CISTART=9965,10041;CIEND=10203,10203;PU=3;SU=0;CU=13;RD=23;NC=3;MQ=13.4783;MQF=1;SP=1,0,0;CHR2=1;DI=f;END=10204;SVLEN=201      GT:GL:NR:NA:NS:RD       0/1:-49.7373,-13.8629,-19.7461:2:18:18:20       1/1:-255,-255,-0.374464:0:3:3:3
# classifier:
1       10004   .       N       AACCCCNANCCCNACCCCAACCCCCACCCCAN        .       .       LRT=0;WAF=1,0.500001,0.750001;GC=1,1;AT=1,0.0434783,0.0434783,0,0,0,0,0,0,0,0,0,0.0434783,0,0.475082;CF=0.173913;CISTART=9965,10041;CIEND=10203,10203;PU=3;SU=0;CU=13;RD=23;NC=3;MQ=13.4783;MQF=1;SP=1,0,0;CHR2=1;DI=f;END=10204;SVLEN=201;WC=INR;WP=0.254,0.158,0.372,0.216    GT:GL:NR:NA:NS:RD       0/1:-49.7373,-13.8629,-19.7461:2:18:18:20       1/1:-255,-255,-0.374464:0:3:3:3

interpretation WC=INR; this probably means insertion. WP=0.254,0.158,0.372,0.216: not sure of the sequence of probabilities.

Sorting the last column of training data (lexicographically), I get: DEL, DUP, INR, INV. In this example, the variant was classified as INR, with prob of 0.3 - which seems to be highest in this case. So I can assume that the labels of the prob. are also DEL, DUP, INR and INV?

info from docs WP: The probabilities for each class label generated by the random forest classifier. The format field is comprised of six colon-delimited fields.

This is comma separated, and the number depends on training data supplied, am I getting this right?

thanks!

zeeev commented 8 years ago

@sahilseth That is correct.

zeeev commented 8 years ago

@ejodude Any movement on this EJ?

ejodude commented 8 years ago

Thanks @sahilseth for the heads up. It looks like the code is running fine, but that we will need to add an update before scikitlearn moves to v0.19. I've also changed the wiki highlight the requirement for 2.7 and not 3.0+