rapidsai / rapids-examples

33 stars 24 forks source link

Cuber Topic Installation Google Colab #56

Open research2023 opened 1 year ago

research2023 commented 1 year ago

!nvidia-smi

This get the RAPIDS-Colab install files and test check your GPU. Run this and the next cell only.

Please read the output of this cell. If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.

!pip install pynvml !pip install bertopic !git clone https://github.com/rapidsai/rapidsai-csp-utils.git !python rapidsai-csp-utils/colab/env-check.py

This will update the Colab environment and restart the kernel. Don't run the next cell until you see the session crash.

!bash rapidsai-csp-utils/colab/update_gcc.sh import os os._exit(00)

This will install CondaColab. This will restart your kernel one last time. Run this cell by itself and only run the next cell once you see the session crash.

import condacolab condacolab.install()

you can now run the rest of the cells as normal

import condacolab condacolab.check()

Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py '

The options are 'stable' and 'nightly'. Leaving it blank or adding any other words will default to stable.

!python rapidsai-csp-utils/colab/install_rapids.py stable import os os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so' os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/' os.environ['CONDA_PREFIX'] = '/usr/local'

research2023 commented 1 year ago

This is the code I am using for google colab for the Rapids python packages. I am an interest in using the cuberTopic since per your article it is faster than the regular BerTopic.

research2023 commented 1 year ago

In the time since this blog post was released, the BERTopic library has added initial support for cuML. We recommend using cuML directly with BERTopic, which you can do by following the example below drawn from the BERTopic documentation

research2023 commented 1 year ago

Following your instructions in the post I tried using the Bertopic implementation but it was giving me the following error:

TypeError Traceback (most recent call last) in ----> 1 from bertopic import BERTopic 2 from cuml.cluster import HDBSCAN 3 from cuml.manifold import UMAP 4 # Create instances of GPU-accelerated UMAP and HDBSCAN 5 umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)

3 frames /usr/local/lib/python3.7/dist-packages/hdbscan/hdbscan_.py in 507 leaf_size=40, 508 algorithm="best", --> 509 memory=Memory(cachedir=None, verbose=0), 510 approx_min_span_tree=True, 511 gen_min_span_tree=False,

TypeError: init() got an unexpected keyword argument 'cachedir'

research2023 commented 1 year ago

This installation is based on the following medium post: https://medium.com/rapids-ai/run-rapids-on-google-colab-for-free-1617ac6323a8. I have been able to run some of your examples except for cuBertopic. Could you tell me where I can find that one? or how to install it in google colab?

VibhuJawa commented 1 year ago

Following your instructions in the post I tried using the Bertopic implementation but it was giving me the following error:

TypeError Traceback (most recent call last) in ----> 1 from bertopic import BERTopic 2 from cuml.cluster import HDBSCAN 3 from cuml.manifold import UMAP 4 # Create instances of GPU-accelerated UMAP and HDBSCAN 5 umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)

3 frames /usr/local/lib/python3.7/dist-packages/hdbscan/hdbscan_.py in 507 leaf_size=40, 508 algorithm="best", --> 509 memory=Memory(cachedir=None, verbose=0), 510 approx_min_span_tree=True, 511 gen_min_span_tree=False,

TypeError: init() got an unexpected keyword argument 'cachedir'

This is because of https://github.com/scikit-learn-contrib/hdbscan/issues/562 that was introduced in the last release of HDBSCAN. You can fix this by pinning joblib==1.1.0 . Please see the above issue for more details.

This installation is based on the following medium post: https://medium.com/rapids-ai/run-rapids-on-google-colab-for-free-1617ac6323a8. I have been able to run some of your examples except for cuBertopic. Could you tell me where I can find that one? or how to install it in google colab?

FYI, We now recommend users to use SageMaker instead of collab because of some issues with google collab.
https://studiolab.sagemaker.aws/import/github/rapidsai-community/rapids-smsl/blob/main/rapids-smsl.ipynb