Open dannashao opened 7 months ago
Thanks for the issue @dannashao, this looks like a host memory leak, not entirely sure where it is happening yet but we're looking into it.
Also having this issue, but occuring 100% of the time. Running with n_jobs = 1, cuml SVC. Memory usage steadily increases with every new candidate, until eating up 24 GB vmem + all my RAM.
I was able to fix it by putting in a dummy transformer with the sole purpose of garbage collecting.
class GarbageCollector(BaseEstimator, TransformerMixin):
"""
cuml is allocating models on heap memory and it is not being GCed with every gridsearch iteration.
Forcibly release the memory by calling gc.collect() after every fit and transform.
Include in gridsearch pipeline.
"""
def __init__(self):
pass
def fit(self, X, y = None):
gc.collect()
return self
def transform(self, X):
gc.collect()
return X
pipe_svm = Pipeline([("garCollect", GarbageCollector()),
("svm", SVC(random_state=42, verbose=2))])
grid_search_pca_svm = GridSearchCV(
estimator=pipe_pca_svm, param_grid=param_grid_pca_svm, cv=5, scoring=cuml_accuracy_scorer,
verbose=10, n_jobs=1
)
Describe the bug When using GridSearchCV with SVC on the same piece of code, it randomly return
MemoryError: std::bad_alloc: out_of_memory
sometimes.Steps/Code to reproduce bug Say we're running a grid search fitting 3 folds for each of 16 candidates, totalling 48 fits by
The grid search sometimes (about 15%) stops in the middle and returns
MemoryError: std::bad_alloc: out_of_memory
. The grid search can complete without changing anything and simply re-run the code. The error occurs under different parameter grid, number of cv and data.Checking with
nvidia-smi
, if everything goes correctly, the GPU memory usage will decrease to the initial value (about 3000 MB) at some time in the middle. If not, it will continuously increase until the error occurs.Expected behavior The GPU memory is freed every time the code runs, not sometimes.
Environment details: