Describe the bug
This is a follow up to https://github.com/rapidsai/cudf/issues/3700. For a small toy dataset of 3 headlines, 2 tickers and 2 company names, the code below works fine. However, when running the same code when there are 1.6 million+ headlines and 9,000+ tickers and company names, the code runs for a bit and then the kernel crashes. The memory usage never exceeded 10% of the 32 GB and the GPU usage was pretty nominal until spiking up to > 75% right before the kernel crashed.
Steps/Code to reproduce bug
import cudf
import numpy as np
import nvstrings, nvtext
dts = np.array([20170101, 20170102, 20170103], dtype='int32')
headlines = ['FB buys Instagram', 'Trump says something', 'Amazon makes Bezos richer', ...]
# headlines is a list of 1.6M strings
gdf = cudf.DataFrame()
gdf['headline_text'] = headlines
gdf['dt'] = dts
TICKERS = ['fb', 'amzn', ... ] # 9000+ elements long
CO_NAMES = ['facebook', 'amazon', ... ] # 9000+ elements long
headlines_nvs = nvstrings.to_device(list(gdf['headline_text'].astype('str').to_array()))
TICKERS_N_COS = TICKERS + CO_NAMES
tickers_n_cos_nvs = nvstrings.to_device(TICKERS_N_COS)
bool_matrix_labels = nvtext.contains_strings(headlines_nvs, tickers_n_cos_nvs)
labels_df = pd.DataFrame(bool_mtrx_lbls, columns=TICKERS_N_COS)
Expected behavior
The kernel should not crash.
Environment overview (please complete the following information)
Environment location: docker
Method of cuDF install: conda
Environment details
**git***
Not inside a git repository
Describe the bug This is a follow up to https://github.com/rapidsai/cudf/issues/3700. For a small toy dataset of 3 headlines, 2 tickers and 2 company names, the code below works fine. However, when running the same code when there are 1.6 million+ headlines and 9,000+ tickers and company names, the code runs for a bit and then the kernel crashes. The memory usage never exceeded 10% of the 32 GB and the GPU usage was pretty nominal until spiking up to > 75% right before the kernel crashed.
Steps/Code to reproduce bug
Expected behavior The kernel should not crash.
Environment overview (please complete the following information)
Environment details **git*** Not inside a git repository