maks-sh / scikit-uplift

:exclamation: uplift modeling in scikit-learn style in python :snake:
https://www.uplift-modeling.com
MIT License
707 stars 95 forks source link

Outdated data source links in Retaihero (RU+EN) notebooks. #117

Closed Ksyula closed 3 years ago

Ksyula commented 3 years ago

📚 Documentation

In the QuickStart documentation block https://www.uplift-modeling.com/en/latest/quick_start.html there is links to the notebooks: https://nbviewer.jupyter.org/github/maks-sh/scikit-uplift/blob/master/notebooks/RetailHero_EN.ipynb https://nbviewer.jupyter.org/github/maks-sh/scikit-uplift/blob/master/notebooks/RetailHero.ipynb https://colab.research.google.com/github/maks-sh/scikit-uplift/blob/master/notebooks/RetailHero_EN.ipynb https://colab.research.google.com/github/maks-sh/scikit-uplift/blob/master/notebooks/RetailHero.ipynb

The same notebooks in the /notebooks folder in the repo have outdated links.

The current link to the retailhero-uplift dataset (https://drive.google.com/u/0/uc?id=1fkxNmihuS15kk0PP0QcphL_Z3_z8LLeb&export=download) is outdated and leads to 404 error.

The new link to the same dataset is https://storage.yandexcloud.net/datasouls-ods/materials/9c6913e5/retailhero-uplift.zip

The respective PR covers this issue for the notebooks in /notebooks of the project https://github.com/maks-sh/scikit-uplift/pull/116

Ksyula commented 3 years ago

The fix will come with the solution of the https://github.com/maks-sh/scikit-uplift/issues/101 issue in PR https://github.com/maks-sh/scikit-uplift/pull/121

lyutov89 commented 3 years ago

Hello all! I faced with the same problem when I use this tutorial as example.

I agree, we need to change the link on following: https://storage.yandexcloud.net/datasouls-ods/materials/9c6913e5/retailhero-uplift.zip

But also we need to change "reading data" in next cell (in kernel after downloading):

df_clients = pd.read_csv('/content/data/clients.csv', index_col='client_id') df_train = pd.read_csv('/content/data/uplift_train.csv', index_col='client_id') df_test = pd.read_csv('/content/data/uplift_test.csv', index_col='client_id')

Otherwise we will get again a mistake.

Ksyula commented 3 years ago

Hello @lyutov89, thanks for mentioning it. Feel free to check the PR, which includes the changes you described here https://github.com/maks-sh/scikit-uplift/pull/121