Tests on the breastcancer_data.csv

Hey,

I have a small doubt regarding the output of the second code (ex_02_advanced_options.py) . After running it, we get a table containing risk scores for the classes selected by the optimizer. In case of the breast cancer dataset, the values under each feature (columns of the CSV) are not binary in nature. In this case, how do we proceed ? cause just multiplying the class weights with values might lead to very high final scores which might result in false probability values for P(Y = 1 | x) (almost 99% for all samples)

Here is a truncated output generated by the code

Pr(Y = +1) = 1/(1 + exp(-17 - score))
+-----------------------------------------+------------------+-----------+
| ClumpThickness                          |         1 points |   + ..... |
| MarginalAdhesion                        |         1 points |   + ..... |
| BareNuclei                              |         1 points |   + ..... |
| BlandChromatin                          |         1 points |   + ..... |
| Mitoses                                 |         1 points |   + ..... |
+-----------------------------------------+------------------+-----------+
| ADD POINTS FROM ROWS 1 to 5             |            SCORE |   = ..... |
+-----------------------------------------+------------------+-----------+

How do I interpret this in case of non binary data ?

Also, for the same code, the ex_02...py file, you haven't defined what P is. Referring to the ex_03_constraints.py file, I guess this line should be included in the code just after the data has been imported

N, P = data['X'].shape

ustunb / risk-slim

Tests on the breastcancer_data.csv #5