Add Sklearn-like Early Stopping for NGBoostRegressor

kmedved commented 3 years ago

This is a pull request following up on the discussion here: #234, and the previous pull request here: , adding HistGradientBoostingRegressor-like early stopping functionality. I've consolidated into the existing NGBoostRegressor API here per @alejandroschuler's suggestion.

I apologize for the ~24 commits associated with this. I'm not very experienced with git, and it appears my local changes have been somehow all merged here. I tried deleting my local fork and restarting, but Github pulled them in anyway.

I have done some testing on this in Colab here, and it seems to work (matching the results from the existing code), but any additional testing is welcome.

ryan-wolbeck commented 3 years ago

thanks @kmedved I'll try to review later this week. Don't worry about the commit spam, when I do the merge I'll squash them

ryan-wolbeck commented 3 years ago

@kmedved can you run 'make lint' and fix any issues it highlights? Looks like some checks are failing

kmedved commented 3 years ago

I'm not sure how to run 'make lint' (very poor understanding of software development - I apologize). I'm working in VSCode, and I did run Flake8, Pylint, and Pylama, and fixed some problems it flagged. There were a couple problems I didn't fix:

1) Many of the lines are too long (mostly commented descriptions). This flagged over 60 lines, many of which were not written by me, so I didn't touch them.

2) Pylama complains that Ngboost.fit is now too complex a function, presumably having 9 parameters. I don't have a good fix for this.

I'm happy to run another linter, or handle this otherwise however @ryan-wolbeck if you can help me out.

alejandroschuler commented 3 years ago

I'm not sure how to run 'make lint' (very poor understanding of software development - I apologize). I'm working in VSCode, and I did run Flake8, Pylint, and Pylama, and fixed some problems it flagged. There were a couple problems I didn't fix:

Many of the lines are too long (mostly commented descriptions). This flagged over 60 lines, many of which were not written by me, so I didn't touch them.

Pylama complains that Ngboost.fit is now too complex a function, presumably having 9 parameters. I don't have a good fix for this.

I'm happy to run another linter, or handle this otherwise however @ryan-wolbeck if you can help me out.

For the long comments, just add a line break and continue the comments on the next line. I don't know about pylama so hopefully @ryan-wolbeck can help but in most linters you can add a comment to the function declaration like some code here... # pylint:ignore=too-many-variables and the linter will make an exception for that.

kmedved commented 3 years ago

@ryan-wolbeck - I could use help here with the linting changes required to get this past the necessary checks.

ryan-wolbeck commented 3 years ago

Hey @kmedved

Few comments so bear with me haha

Git wise, when you fork this repo you'll want to start creating a branch within our fork and making a PR of that branch (not master) to this repository. I tried to commit directly into this PR to fix the issue it has but I couldn't for permissions issuses. So my only option was to create a new PR based on your changes here https://github.com/stanfordmlgroup/ngboost/pull/268

That said, if you can fix the same issues I did then I'll just close that PR after it's ready to go here.

To run the linter you'll need to be in a conda enviornment and install the setup. Once your sourced in like below

97D65786-39F5-4D1B-AEA1-0018E02C0DC7

once your into the environment you'll need to do a poetry install by running make install This should flip through and install the poetry dependencies to run the linters

You'll then want to run the following (from the root of the project) E0A2E0F0-FB26-4432-8331-38DD0ABA7CB8

You can see above that when I ran that it failed, but also reformatted and fixed some of the issues.

The only issue you'll see still remaining is 6BB8C942-1CE8-4AD7-AFE8-12DE0AFA4293 For that all you'll need to do is to go into line 46 of ngboost/api.py and split that line into two like I've done in the PR

Once you've done that, re-run the linter again

15AF0280-A84B-44E1-A070-44891D81AF21

Everything should pass there then run make test, that'll run the tests locally before you commit, if those all pass then do

git commit -m 'updating linter and line issues' ngboost/api.py ngboost/ngboost.py

and we should be off to the races :)

alejandroschuler commented 3 years ago

Hey @kmedved

Few comments so bear with me haha

Git wise, when you fork this repo you'll want to start creating a branch within our fork and making a PR of that branch (not master) to this repository. I tried to commit directly into this PR to fix the issue it has but I couldn't for permissions issuses. So my only option was to create a new PR based on your changes here #268

That said, if you can fix the same issues I did then I'll just close that PR after it's ready to go here.

To run the linter you'll need to be in a conda enviornment and install the setup. Once your sourced in like below

once your into the environment you'll need to do a poetry install by running make install This should flip through and install the poetry dependencies to run the linters

You'll then want to run the following (from the root of the project)

You can see above that when I ran that it failed, but also reformatted and fixed some of the issues.

The only issue you'll see still remaining is For that all you'll need to do is to go into line 46 of ngboost/api.py and split that line into two like I've done in the PR

Once you've done that, re-run the linter again

Everything should pass there then run make test, that'll run the tests locally before you commit, if those all pass then do

git commit -m 'updating linter and line issues' ngboost/api.py ngboost/ngboost.py

and we should be off to the races :)

Just wanted to give a shout-out and huge thanks to @ryan-wolbeck for helping out with this!!

ryan-wolbeck commented 3 years ago

@alejandroschuler no problem! Took me a while to reply because the battery on my macbook pro expanded and I was without my daily driver haha. Side note, do you think we should have some sort of test related to this feature? I'm not sure how we'd devise one? I'm okay with merging it after the updates are made but just curious if you have any thoughts on who we could draft something up here

alejandroschuler commented 3 years ago

@alejandroschuler no problem! Took me a while to reply because the battery on my macbook pro expanded and I was without my daily driver haha. Side note, do you think we should have some sort of test related to this feature? I'm not sure how we'd devise one? I'm okay with merging it after the updates are made but just curious if you have any thoughts on who we could draft something up here

Yeah, good idea. I think it's fine for now because it's pretty simple functionality but I'll put a test for this on my plate and get back to you when I have something.

ryan-wolbeck commented 3 years ago

@alejandroschuler no problem! Took me a while to reply because the battery on my macbook pro expanded and I was without my daily driver haha. Side note, do you think we should have some sort of test related to this feature? I'm not sure how we'd devise one? I'm okay with merging it after the updates are made but just curious if you have any thoughts on who we could draft something up here

Yeah, good idea. I think it's fine for now because it's pretty simple functionality but I'll put a test for this on my plate and get back to you when I have something.

Yeah I agree this one is pretty small so probably minimal risk, generally I'm thinking as this project gets larger we should be adding something simple in each time with these features. I'm a bit worried that as our user base gets larger that we might merge something then have people panicking because it instituted a break we didn't anticipate haha

kmedved commented 3 years ago

Thanks for the detailed helped @ryan-wolbeck - that's super useful. I'll give that a shot in the next day or two and hopefully it works.

With respect to testing, I agree it would be helpful. I did some basic testing to make sure this worked, didn't break, and generated identical results for a bunch of regression tasks, but given I'm modifying the base NGBoost class, it's probably worth checking the Classifier and Survivorship functionalities as well.

kmedved commented 3 years ago

@ryan-wolbeck I tried to do the steps above, but I think this may be over my head for the time being (I corrupted my Conda environment in the process of doing this; for some reason running 'Make Install' resulted in downgrading my version of pandas which broke everything else). I appreciate the help, but I'm clearly making some mistake somewhere.

Can I make these changes manually to get these changes merged?

ryan-wolbeck commented 3 years ago

@kmedved yeah that's no problem, just make the changes in my other PR and commit them out in one commit

ryan-wolbeck commented 2 years ago

In the interest of timing (I merged the sparse matrix pr and merged it into my clone branch here) I'm going to close this guy and publish a new package with @kmedved's updates. Thanks for the work on this!

kmedved commented 2 years ago

Thanks so much @ryan-wolbeck - I got very busy with work and didn't have time to finish this. Thank you for your patience with me in getting this over the finish line.

stanfordmlgroup / ngboost

Add Sklearn-like Early Stopping for NGBoostRegressor #264