Closed gtduncan closed 1 year ago
We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report. To ensure accuracy in future PRs, please see these guidelines. A quick fix for this PR: rebase it; your next report should be accurate.
Totals | |
---|---|
Change from base Build 4317956189: | 0.0% |
Covered Lines: | 3228 |
Relevant Lines: | 3429 |
@gtduncan looks like you have a conflict on changelog.md. Please mark as ready for review when you're ready. Also I can't see the guide via the link, please fix
Docs building was failing and I found it was one of the cells timing out— specifically, the cell describing forward selection in the 2.1 Forward Selection
portion. I ran just that cell, found it was taking 90 seconds to run, and changed the execution time in conf.py to mitigate this– let me know if there's something else you'd want to do about it because it's a pretty large jump. I'm also confused how the docs passed on @bbeat2782's CI as well. I also added @neelasha23's requested edits from the original PR
docs build is failing @gtduncan
I think the problem is here:
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 120 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
['from sklearn.feature_selection import SequentialFeatureSelector', 'from sklearn.ensemble import RandomForestClassifier', 'from sklearn_evaluation import plot', 'rfc = RandomForestClassifier()', 'forward_select = SequentialFeatureSelector(']
...
[')', 'forward_select.fit(X_clf_train, y_clf_train)', 'features = forward_select.get_feature_names_out()', 'rfc.fit(X_clf_train[features], y_clf_train)', 'plot.feature_importances(rfc)']
-------------------
let's try reducing the data size or any other parameter that affects runtime
worst case, we can increase the cell timeout but that should be our last resort since it'll slow down the doc building process
I was able to get the cell you mentioned to run quicker by changing the n_features_to_select
parameter from 'auto'
to 0.1
seen here: forward_select = SequentialFeatureSelector(rfc, direction='forward', n_features_to_select=0.1)
— however, in the following cell where backward_select = SequentialFeatureSelector(rfc, direction='backward', n_features_to_select='auto')
is called, regardless if I change the n_features_to_select
parameter, the cell takes around 2 and a half minutes to execute. Any ideas on how to reduce this runtime?
reducing the number of rows will help with runtime (I'm guessing that's the rfc
parameter), how large it is?
The rfc
parameter is RandomForestClassifier(). I've tried lowering the n_estimators
parameter in that model as well, which seems to make the forward selection run pretty quickly, but the backward selection still times out. It does run locally, but I think it may just be too slow for the CI... I'll keep looking into solutions
I think let's make it a non-runnable cell (create it as a markdown cell in Jupyter). just copy whatever output it produces
Describe your changes
Edits https://github.com/ploomber/sklearn-evaluation/pull/294 to pass lint check and allows appearance in navbar
Original tutorial: @bbeat2782
Checklist before requesting a review
:books: Documentation preview :books:: https://sklearn-evaluation--304.org.readthedocs.build/en/304/