Closed ggnicolau closed 1 year ago
I've installed rapids-21.12 (had to remove 'nightly' from the code to find it), then I installed cudatoolkit 11.2 (couldn't install through conda or pip, I had to install through wget). Now I'm getting the following error on the example notebook: temporary_buffer::allocate: get_temporary_buffer failed
. CudaAPIError: [719] Call to cuLinkCreate results in CUDA_ERROR_LAUNCH_FAILED
.
Then I tried with a small sample (5000 rows) from another dataset, but if I try to get_topic I have the following error: TypeError: 'NoneType' object is not subscriptable
.
@ggnicolau , Thanks a lot for trying out the repo and giving such a detailed description of changes.
Let me look into this and update here. Thanks for the patience.
I am facing a similar issue with the cuBERTopic fit_transform method. I was able to successfully run the topic modelling with bertopic.
Error Log below:
topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)
Traceback (most recent call last):
File "
We now recommend users to use the upstream BERTopic directly as they now support rapids directly. See below:
In the time since the blog post/code was released, the BERTopic library has added initial support for cuML. We recommend using cuML directly with BERTopic, which you can do by following the example below drawn from the BERTopic documentation.
from bertopic import BERTopic
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP
# Create instances of GPU-accelerated UMAP and HDBSCAN
umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)
hdbscan_model = HDBSCAN(min_samples=10, gen_min_span_tree=True)
# Pass the above models to be used in BERTopic
topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model)
topics, probs = topic_model.fit_transform(docs)
Hi, I'm trying to use cuBERTopic.
I tried to install using the YAML or using conda code provided by the repository. Both didn't work since they can't find version 21.12. So I decided to install it using version 22.06 with some adaptions for a VM inside Google Cloud Platform, using CUDA-11.0:
But, then, I've got a NVCC PATH warning while importing cuBERTopic. So I changed the beginning of cuBERTopic.py file to my current cuda PATH:
Then, when I try to import the libraries as followed by the example notebook, I get an
AttributeError: 'NoneType' object has no attribute 'split'; --> 324 cmd = _nvcc.split()
error. But I'm able to import if I change the order of the imports:Then, everything works and I can check and see that it's using the GPU for training the Notebook provided as an example, but by the end, I get the following error while using tf-idf:
ValueError: Duplicate column names are not allowed
. The full log is:I know I've did a bunch of critical changes here, one on top of another. But maybe you can help me to make it work properly? :)
Wish you the best! Thank you for implementing BERTopic with RAPIDS!