huggingface / text-clustering

Easily embed, cluster and semantically label text datasets
Apache License 2.0
431 stars 33 forks source link

Unnabl to run the example in Readme #9

Open Manel-Hik opened 5 months ago

Manel-Hik commented 5 months ago

Hi HF Team Thanks a lot for this repo. I was exploring the code and trying to run it with the example provided in Readme file. But it throws this error related to protobuf. Here is the error: `2024-03-31 22:07:00.161046: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib 2024-03-31 22:07:00.161086: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "/home/ubuntu/for_nlp_tasks/R_and_D/text-clustering/test_it.py", line 1, in from src.text_clustering import ClusterClassifier File "/home/ubuntu/for_nlp_tasks/R_and_D/text-clustering/src/text_clustering.py", line 17, in from umap import UMAP File "/opt/conda/lib/python3.10/site-packages/umap/init.py", line 7, in from .parametric_umap import ParametricUMAP File "/opt/conda/lib/python3.10/site-packages/umap/parametric_umap.py", line 14, in import tensorflow as tf File "/opt/conda/lib/python3.10/site-packages/tensorflow/init.py", line 37, in from tensorflow.python.tools import module_util as _module_util File "/opt/conda/lib/python3.10/site-packages/tensorflow/python/init.py", line 37, in from tensorflow.python.eager import context File "/opt/conda/lib/python3.10/site-packages/tensorflow/python/eager/context.py", line 29, in from tensorflow.core.framework import function_pb2 File "/opt/conda/lib/python3.10/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attrvaluepb2 File "/opt/conda/lib/python3.10/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensorpb2 File "/opt/conda/lib/python3.10/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resourcehandlepb2 File "/opt/conda/lib/python3.10/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensorshapepb2 File "/opt/conda/lib/python3.10/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in _descriptor.FieldDescriptor( File "/opt/conda/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 553, in new__ _message.Message._CheckCalledFromGeneratedFile() TypeError: Descriptors cannot be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates`

I understand that it's mentionned that this project is under progress, but I was really interested to try it, since I'm working on something similar about arabic text clustering and would like to get some good results. Thanks a lot

Manel-Hik commented 4 months ago

is there any news for that?