rapidsai / rapids-examples

33 stars 24 forks source link

cuBERTtopic error: cuDF failure at: [...] Could not open vocab/voc_hash.txt #52

Closed dimidloc closed 1 year ago

dimidloc commented 2 years ago

I've installed Rapids using mamba create -n rapids-22.04 -c rapidsai -c nvidia -c conda-forge rapids=22.04 python=3.9 cudatoolkit=11.3 dask-sql --no-channel-priority

and then mamba install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

I've tried to follow the example at: https://github.com/rapidsai/rapids-examples/blob/main/cuBERT_topic_modelling/berttopic_example.ipynb

After cloning, and pip install -e . cuBERTopic I ran:

from cuBERTopic import gpu_BERTopic
gpu_topic = gpu_BERTopic()
topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)

The last line fails with:

RuntimeError Traceback (most recent call last) /tmp/ipykernel_1502/660932292.py in ----> 1 topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)

~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py in fit_transform(self, data) 204 205 # Extract embeddings --> 206 embeddings = create_embeddings( 207 documents.Document, self.embedding_model, self.vocab_file 208 )

~/rapids-examples/cuBERT_topic_modelling/embedding_extraction.py in create_embeddings(sentences, embedding_model, vocab_file) 71 """ 72 ---> 73 cudf_tokenizer = SubwordTokenizer(vocab_file, do_lower_case=True) 74 batch_size = 256 75 pooling_output_ls = []

/opt/conda/envs/rapids-22.04/lib/python3.9/site-packages/cudf/core/subword_tokenizer.py in init(self, hash_file, do_lower_case) 53 54 self.do_lower_case = do_lower_case ---> 55 self.vocab_file = cpp_hashed_vocabulary(hash_file) 56 57 def call(

cudf/_lib/nvtext/subword_tokenize.pyx in cudf._lib.nvtext.subword_tokenize.Hashed_Vocabulary.cinit()

RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/text/subword/load_hash_file.cu:183: Could not open vocab/voc_hash.txt

abjt11 commented 2 years ago

Change path from '../vocab/voc_hash.txt' to '../cuBERT_topic_modelling/vocab/voc_hash.txt'

VibhuJawa commented 1 year ago

Closing as changing path seems to do the trick.