square / pysurvival

Open source package for Survival Analysis modeling
https://www.pysurvival.io/
Apache License 2.0
350 stars 106 forks source link

Make pysurvival work with scikit-learn #15

Open pransito opened 4 years ago

pransito commented 4 years ago

I have noticed that PySurvival does not really follow the priniciples of scikit-learn. Starting with the fact that you input X, T, E, instead of X, y. Further GridSearchCV cannot be used because of the aforementioned problem but also because there is no set_params method in the model objects. (also see pipeline of scikit-learn, which only works after extensive reworking of many classes and functions in scikit-learn). This is very unfortunate, I think, that this great package keeps outside of sklearn. Is there any plan to fix this and make PySurvival connectable to scikit-learn? Or am I missing something?

bacalfa commented 4 years ago

FYI, I'm working on a solution to this issue. I expect to have something in a few days.

bacalfa commented 4 years ago

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

Then reinstall it:

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

camferna commented 4 years ago

Omg thank you so much! hahaha

JCCKwong commented 4 years ago

Hi bacalfa,

Beginner coder here. I've been trying to follow your instructions above to install pysurvival on Windows 10. I've tried downloading the zip file and cloning it with git clone. I've also checked to make sure I have MSVC14. Each time I run into the following issue:

c1xx: fatal error C1083: Cannot open source file: 'pysurvival/cpp_extensions/_functions.cpp': No such file or directory

Any advice would be greatly appreciated. Really looking forward to trying this package out!

bacalfa commented 4 years ago

@JCCKwong, can you give more details on the steps you're taking and what happens after you execute them? Also, did you clone my forked repository (https://github.com/bacalfa/pysurvival) instead of the one from the original author (https://github.com/square/pysurvival)?

JCCKwong commented 4 years ago

@bacalfa yes, I cloned your forked repository. Here's a screenshot of what I did on Anaconda 3.

Issue

bacalfa commented 4 years ago

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in https://github.com/square/pysurvival/issues/15#issuecomment-579584083.

JCCKwong commented 4 years ago

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment).

It worked, thanks! Really appreciated your help!

DashengSong commented 4 years ago

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

  • python -m pip uninstall pysurvival

Then reinstall it:

  • python setup.py build_ext --inplace (to rebuild the package)
  • python setup.py install --user (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment). Can it work in StratifiedKFold?

bacalfa commented 4 years ago

@DashengSong, have you tried it? I don't think I have.

KaranMehta21 commented 4 years ago

Hi @bacalfa . Thanks for creating a package that can be installed on Windows. I'm trying to use the sklearn compatibility feature you've added. Does it work the random survival forest estimator too?

bacalfa commented 4 years ago

@KaranMehta21, I think it does. But there may be a caveat: https://github.com/square/pysurvival/issues/17.

KaranMehta21 commented 4 years ago

@bacalfa OK I'll try it out. Is the benefit of using it to implement cross-validation and hyperparameter tuning and will that lead to higher c-indices? Currently, the RSF model I've trained has a c index of 0.71. I'm looking for ways to increase it closer to 0.80. Any suggestions?

bacalfa commented 4 years ago

Honestly, I haven't used this package that much, so I'm not sure what to suggest. There are simpler and more complex models. It's a good habit to evaluate performance with a validation set (like in CV) and perform hyperparameter tuning. Difficult to know which algorithm will be the best a priori. So try (and tune) as many as you can, and make sure you make a fair comparison between them.

SurajitTest commented 4 years ago

Hi All, Would really appreciate if anyone can help me. I have downloaded the package which is at location : C:\Users\User\Downloads\pysurvival-master. For me , I have installed Anaconda at C:\Users\User. I am providing you with the steps that I think I need to follow, please guide so that I can carry out the installation correctly.

Step-1: Create a Directory : C:\Users\User\pysurvival (as Anaconda is installed in C:\Users\User ) Step-2: Copy all contents from C:\Users\User\Downloads\pysurvival-master to C:\Users\User\pysurvival (now setup.py is in this location) Step-3: Navigate to C:\Users\User\pysurvival (using command prompt) Step-4: Run the 2 below commands python setup.py build_ext --inplace (to rebuild the package) python setup.py install --user (to install the files to your local directories)

CoteDave commented 3 years ago

Hi @bacalfa ,

I've tried your fork with the setup.py

Unfortunatly, still not working for me, because of this line: extra_compile_args = ["/O2"]

Error occuring when: building 'pysurvival.utils._functions' extension

Error: gcc: error: /O2: No such file or directory error: command 'C:\MinGW\bin\gcc.exe' failed with exit status 1

Thanks!

bacalfa commented 3 years ago

@CoteDave, I don't have MinGW installed on my Windows machine (and it's not easy to do so). The error seems to suggest that /O2 is an option for the MS C/C++ compiler, which isn't recognized by MinGW. If you change line 61 in setup.py to the same thing as in line 63, I think it'd work. Let me know.

CoteDave commented 3 years ago

Hi @bacalfa , changed the line 61.

No more /O2 error, but sadly, a new error occurs at the same place:

building 'pysurvival.utils._functions' extension c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: C:...\Anaconda3\libs/libpython38.dll.a: error adding symbols: file format not recognized collect2.exe: error: ld returned 1 exit status error: command 'C:\MinGW\bin\g++.exe' failed with exit status 1

bacalfa commented 3 years ago

@CoteDave, that error looks similar to this one. See the suggestion there.

elopezfune commented 3 years ago

I would like as well to make a suggestion. Could you please as well include a Lasso regularization term into the Linear Multi-Task Logistic Regression and Linear SVM Loss Functions in order to be similar to Sklearn to do Ridge, Lasso or ElasticNet regularizations? It will be something like adding a new parameter called "penalizer" such that in line 191 of multi_task.py is written: loss += penalizer( l2_regtorch.sum(ww)/2. + (1.0-l2_reg)torch.sum(np.sqrt(w*w)))

Therefore, if l2_reg=1, one is doing Ridge regularization, if l2_reg=0 one is doing Lasso regularization, and when 0<l2_reg<1 one is doing ElasticNet.

bacalfa commented 3 years ago

@elopezfune, regarding your error, see if this helps.

I'll see if I can help with the regularization request and will let you know.

bacalfa commented 3 years ago

@elopezfune, I'd prefer to create a branch for this request. Let's call it elastic_net_loss.

For MTLR, I'd do:

loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

For consistency, I should probably apply the same change to other models. For SVM, that would require modifying Cython code (file _svm.pyx). I'll need more time to make sure I understand what changes to make. Any help is welcome. I'm actually not a user of this package at the moment. Just trying to help maintain it for others. :)

elopezfune commented 3 years ago

Thanks for the quick answer. I believe ElasticNet will give the users more flexibility to optimize survival models.

elopezfune commented 3 years ago

Yes, a line of code like this is perfect! loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

I tried once to change it manually on the local files, but I didn't have access to the optimization code (Cython), therefore, it didn't work.

elopezfune commented 3 years ago

Well, indeed, there is the need to introduce a new parameter namely penalizer or something like this, which will be the "penalizer" of the model. l2_reg will be to choose between Ridge, Lasso or ElasticNet.

bacalfa commented 3 years ago
The following packages contain unfulfilled dependencies:
  python3-dev: Depends: libpython3-dev (= 3.8.2-0ubuntu2) but will not be installed
                Depends: python3.8-dev (> = 3.8.2-1 ~) but will not be installed
E: Unable to correct problems, bad packets are in "keep as is" mode.

What Python version do you have installed? It seems to be suggesting that you should have at least 3.8 to be able to install libpython3-dev.

These errors you're experiencing are specific to your Ubuntu system, not really to pysurvival. Once you have all the dependencies installed, you should be able to build pysurvival.

elopezfune commented 3 years ago

I have Python 3.8.6

bacalfa commented 3 years ago

You'll have to do some searching on the errors you're getting. I can't reproduce it because I currently don't have access to Ubuntu. See this.

elopezfune commented 3 years ago

Thanks, I m trying to solve this problem that it is driving me crazy

bacalfa commented 3 years ago

Adding support for l1 regularization to SVM isn't trivial. It requires modifications to Cython code (doable), but I can't find the reference for the formulation. And I don't have a lot of time to spend on this. If anyone would like to contribute or help, that'd be appreciated. SVM in this package doesn't use PyTorch (loss, gradient, and Hessian are manually implemented in Cython, so it's important to know the full formulation in order to modify it).

elopezfune commented 3 years ago

I could help on that! I just need a way to access the Cython

bacalfa commented 3 years ago

They're in cpp_extensions. For example: https://github.com/bacalfa/pysurvival/blob/master/pysurvival/cpp_extensions/_svm.pyx.

byronmamamoney commented 3 years ago

Hi, I've created an Ubuntu 18.04 AWS EC2 instance. Installed python 3.6, 3.7 and 3.8 (just to make sure it is not due to the Python version) Followed the installation steps as per https://square.github.io/pysurvival/installation.html When running the "pip install pysurvival" command I get:

**urvival/cpp_extensions/_functions.o -std=c++11 -O3 pysurvival/cpp_extensions/_functions.cpp:4:10: fatal error: Python.h: No such file or directory

include "Python.h"

        ^~~~~~~~~~

compilation terminated. error: command '/usr/bin/gcc-8' failed with exit status 1

ERROR: Failed building wheel for pysurvival**

Please can you assist with this?

Kind Regards Byron

bacalfa commented 3 years ago

@byronmamamoney, please see above the discussion with @elopezfune. You'll have to Google the errors that come up. I don't have a way to test it on Ubuntum

byronmamamoney commented 3 years ago

Thanks @bacalfa I've got it working after removing the various python versions (except 3.6), the default for the Ubuntu box. Did a reboot and reinstalled the libraries.

Great work on the documentation here: https://square.github.io/pysurvival/index.html

Regards Byron

elopezfune commented 3 years ago

What does pct_importance mean ?? https://square.github.io/pysurvival/tutorials/churn.html#52-variables-importance

elopezfune commented 3 years ago

If I understand correctly, it is the percentage of each feature importance. It could be good to include the explanation of this feature in the documentation.

pransito commented 3 years ago

Hi, the issue comments seem to go a bit off topic. @bacalfa thanks for your work. Any chance to make a pull request to merge your version with the official repo? So that your sklearn add on will become available generally?

bacalfa commented 3 years ago

@pransito I think the original author isn't maintaining this package anymore., unfortunately. That's why I forked it and fixed a few issues. But you can try to reach out to him.

andreas-kaae commented 3 years ago

Awesome job @bacalfa, this is exactly what I was looking for!!

If there are other less skilled coders like me who needs a bit more clarification then I can spare you some time by following these slightly more detailed steps based on previously mentioned explanations.

1) Download the zip folder from the link: https://github.com/bacalfa/pysurvival and unpack the zip to get the folder "pysurvival-master". 2) Copy folder to your user path. For me, this is: C:\Users\Andreas 4) Open your "Anaconda Prompt" if you're not using anaconda I assume the normal command prompt might also work, but I have no idea. 3) If you have pysurvival installed uninstall it by: pip uninstall pysurvival 4) Set directory by typing cd C:\Users\Andreas\pysurvival-master 5) Then reinstall by first running this code: python setup.py build_ext --inplace 6) Lastly this code: python setup.py install --user

The sklearn-adapter should now work.

dadekandrew2010 commented 2 years ago

With your modified version, I can make MultiTaskModel work fine with scikit-learn. However, with NeuralMultiTaskModel, I write my code as follows.

my coding

NMTLModel_skl = sklearn_adapter(NeuralMultiTaskModel,time_col='time', event_col='status',predict_method="predict_survival", scoring_method= concordance_index) mystructure = [ {'activation': 'ReLU', 'num_units': 150}, ] nmtlr_model_skl = NMTLModel_skl(structure= mystructure) nmtlr_model_skl.fit(cli_train, ysur_train,init_method='orthogonal', optimizer = 'rprop', lr=1e-3, num_epochs = 500, bins=150) nmtlr_model_score = nmtlr_model_skl.score(x_test, y_test) from sklearn.model_selection import cross_val_score scores_nmtlr_cli = cross_val_score( estimator= nmtlr_model_skl,fit_params= {"l2_reg": 1E-1}, X = x_test, y= y_test, cv=5)

and get a error as follows. I does not know how to fix it.

the error

Cannot clone object SkLearnNeuralMultiTaskModel(auto_scaler=True, bins=150, structure=[{'activation': 'ReLU', 'num_units': 150}]), as the constructor either does not set or modifies parameter structure

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

* `python -m pip uninstall pysurvival`

Then reinstall it:

* `python setup.py build_ext --inplace` (to rebuild the package)

* `python setup.py install --user` (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

cynmasetto commented 1 year ago

hey @bacalfa - thanks so much for all the answers I recently came across with the library and after performing all the steps I am still not able to install the library in my windows computer. I get the following error

...\pysurvival> python setup.py build_ext --inplace      
running build_ext
building 'pysurvival.models._non_parametric' extension
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\CMasetto\AppData\Local\anaconda3\include -IC:\Users\CMasetto\AppData\Local\anaconda3\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /EHsc /Tppysurvival/cpp_extensions/_non_parametric.cpp /Fobuild\temp.win-amd64-cpython-310\Release\pysurvival/cpp_extensions/_non_parametric.obj -std=c++11 -O3
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
cl : Command line warning D9002 : ignoring unknown option '-O3'  
_non_parametric.cpp
pysurvival/cpp_extensions/_non_parametric.cpp(8246): error C2105: '++' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8248): error C2105: '--' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8510): error C2105: '++' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8512): error C2105: '--' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8947): error C2039: 'tp_print': is not a member of '_typeobject'
C:\Users\CMasetto\AppData\Local\anaconda3\include\cpython/object.h(191): note: see declaration of '_typeobject'
pysurvival/cpp_extensions/_non_parametric.cpp(8971): error C2039: 'tp_print': is not a member of '_typeobject'
C:\Users\CMasetto\AppData\Local\anaconda3\include\cpython/object.h(191): note: see declaration of '_typeobject'
pysurvival/cpp_extensions/_non_parametric.cpp(9661): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
pysurvival/cpp_extensions/_non_parametric.cpp(9677): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.35.32215\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

Am I missing anything? I've installed all the suggested libraries and compilers I don't know what else to do. hope you can help me install it. Thanks