nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
802 stars 83 forks source link

pysbd_as_spacy_component.py -- fails to find pysbd module #119

Closed hiwaveSupport closed 1 year ago

hiwaveSupport commented 1 year ago

Describe the bug Pysbd and Spacy both are installed in my env. `# packages in environment at /home/vibhu/miniconda3/envs/huggfacegpu: #

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.2.0 pypi_0 pypi accelerate 0.16.0 pypi_0 pypi aiohttp 3.8.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi asgiref 3.5.2 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi asynctest 0.13.0 pypi_0 pypi attrs 22.2.0 pypi_0 pypi blas 1.0 mkl
blis 0.7.9 pypi_0 pypi brotlipy 0.7.0 py37h27cfd23_1003
ca-certificates 2023.01.10 h06a4308_0
catalogue 2.0.8 pypi_0 pypi certifi 2022.12.7 py37h06a4308_0
cffi 1.15.1 py37h74dc2b5_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 pypi_0 pypi confection 0.0.4 pypi_0 pypi cryptography 38.0.4 py37h9ce1e76_0
cuda 12.0.1 0 nvidia cuda-cccl 12.0.140 0 nvidia cuda-command-line-tools 12.0.1 0 nvidia cuda-compiler 12.0.1 0 nvidia cuda-cudart 12.0.146 0 nvidia cuda-cudart-dev 12.0.146 0 nvidia cuda-cudart-static 12.0.146 0 nvidia cuda-cuobjdump 12.0.140 0 nvidia cuda-cupti 12.0.146 0 nvidia cuda-cupti-static 12.0.146 0 nvidia cuda-cuxxfilt 12.0.140 0 nvidia cuda-demo-suite 12.0.140 0 nvidia cuda-documentation 12.0.140 0 nvidia cuda-driver-dev 12.0.146 0 nvidia cuda-gdb 12.0.140 0 nvidia cuda-libraries 12.0.1 0 nvidia cuda-libraries-dev 12.0.1 0 nvidia cuda-libraries-static 12.0.1 0 nvidia cuda-nsight 12.0.140 0 nvidia cuda-nsight-compute 12.0.1 0 nvidia cuda-nvcc 12.0.140 0 nvidia cuda-nvdisasm 12.0.140 0 nvidia cuda-nvml-dev 12.0.140 0 nvidia cuda-nvprof 12.0.146 0 nvidia cuda-nvprune 12.0.140 0 nvidia cuda-nvrtc 12.0.140 0 nvidia cuda-nvrtc-dev 12.0.140 0 nvidia cuda-nvrtc-static 12.0.140 0 nvidia cuda-nvtx 12.0.140 0 nvidia cuda-nvvp 12.0.146 0 nvidia cuda-opencl 12.0.140 0 nvidia cuda-opencl-dev 12.0.140 0 nvidia cuda-profiler-api 12.0.140 0 nvidia cuda-runtime 12.0.1 0 nvidia cuda-sanitizer-api 12.0.140 0 nvidia cuda-toolkit 12.0.1 0 nvidia cuda-tools 12.0.1 0 nvidia cuda-visual-tools 12.0.1 0 nvidia cymem 2.0.7 pypi_0 pypi datasets 2.9.0 pypi_0 pypi dill 0.3.6 pypi_0 pypi django 3.2.15 pypi_0 pypi filelock 3.9.0 py37h06a4308_0
flatbuffers 2.0.7 pypi_0 pypi flit-core 3.6.0 pyhd3eb1b0_0
frozenlist 1.3.3 pypi_0 pypi fsspec 2023.1.0 pypi_0 pypi ftfy 6.1.1 pypi_0 pypi future 0.18.2 py37_1
gds-tools 1.5.1.14 0 nvidia huggingface_hub 0.10.1 py37h06a4308_0
idna 3.4 py37h06a4308_0
image 1.5.33 pypi_0 pypi importlib-metadata 4.11.3 py37h06a4308_0
intel-openmp 2021.4.0 h06a4308_3561
jinja2 3.1.2 pypi_0 pypi keras 2.10.0 pypi_0 pypi langcodes 3.3.0 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1
libclang 14.0.6 pypi_0 pypi libcublas 12.0.2.224 0 nvidia libcublas-dev 12.0.2.224 0 nvidia libcublas-static 12.0.2.224 0 nvidia libcufft 11.0.1.95 0 nvidia libcufft-dev 11.0.1.95 0 nvidia libcufft-static 11.0.1.95 0 nvidia libcufile 1.5.1.14 0 nvidia libcufile-dev 1.5.1.14 0 nvidia libcufile-static 1.5.1.14 0 nvidia libcurand 10.3.1.124 0 nvidia libcurand-dev 10.3.1.124 0 nvidia libcurand-static 10.3.1.124 0 nvidia libcusolver 11.4.3.1 0 nvidia libcusolver-dev 11.4.3.1 0 nvidia libcusolver-static 11.4.3.1 0 nvidia libcusparse 12.0.1.140 0 nvidia libcusparse-dev 12.0.1.140 0 nvidia libcusparse-static 12.0.1.140 0 nvidia libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libnpp 12.0.1.104 0 nvidia libnpp-dev 12.0.1.104 0 nvidia libnpp-static 12.0.1.104 0 nvidia libnvjitlink 12.0.140 0 nvidia libnvjitlink-dev 12.0.140 0 nvidia libnvjpeg 12.0.1.102 0 nvidia libnvjpeg-dev 12.0.1.102 0 nvidia libnvjpeg-static 12.0.1.102 0 nvidia libnvvm-samples 12.0.140 0 nvidia libstdcxx-ng 11.2.0 h1234567_1
markupsafe 2.1.2 pypi_0 pypi mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi murmurhash 1.0.9 pypi_0 pypi ncurses 6.3 h5eee18b_3
ninja 1.10.2 h06a4308_5
ninja-base 1.10.2 hd09550d_5
nsight-compute 2022.4.1.6 0 nvidia numpy 1.21.6 pypi_0 pypi numpy-base 1.21.5 py37ha15fc14_3
nvidia-ml-py3 7.352.0 pypi_0 pypi openssl 1.1.1s h7f8727e_0
packaging 21.3 pypi_0 pypi pandas 1.3.5 pypi_0 pypi pathy 0.10.1 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 py37h06a4308_0
preshed 3.0.8 pypi_0 pypi psutil 5.9.4 pypi_0 pypi pyarrow 11.0.0 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0
pydantic 1.10.4 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.9 pypi_0 pypi pysbd 0.3.4 pypi_0 pypi pysocks 1.7.1 py37_1
python 3.7.13 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi pytorch 1.12.1 cpu_py37he8d8e81_0
pytz 2022.2.1 pypi_0 pypi pyyaml 6.0 py37h5eee18b_1
readline 8.1.2 h7f8727e_1
regex 2022.9.13 pypi_0 pypi requests 2.28.1 py37h06a4308_0
responses 0.18.0 pypi_0 pypi sentencepiece 0.1.97 pypi_0 pypi setuptools 63.4.1 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_1
smart-open 6.3.0 pypi_0 pypi spacy 3.5.0 pypi_0 pypi spacy-legacy 3.0.12 pypi_0 pypi spacy-loggers 1.0.4 pypi_0 pypi sqlite 3.39.2 h5082296_0
sqlparse 0.4.3 pypi_0 pypi srsly 2.4.5 pypi_0 pypi stable-diffusion-tf 0.1 pypi_0 pypi tensorboard 2.10.1 pypi_0 pypi tensorflow 2.10.0 pypi_0 pypi tensorflow-addons 0.18.0 pypi_0 pypi tensorflow-estimator 2.10.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.27.0 pypi_0 pypi thinc 8.1.7 pypi_0 pypi tk 8.6.12 h1ccaba5_0
tokenizers 0.11.4 py37h3dcd8bd_1
tqdm 4.64.1 py37h06a4308_0
transformers 4.24.0 py37h06a4308_0
typeguard 2.13.3 pypi_0 pypi typer 0.7.0 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi typing_extensions 4.4.0 py37h06a4308_0
urllib3 1.26.14 py37h06a4308_0
wasabi 1.1.1 pypi_0 pypi wcwidth 0.2.5 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
xxhash 3.2.0 pypi_0 pypi xz 5.2.5 h7f8727e_1
yaml 0.2.5 h7b6447c_0
yarl 1.8.2 pypi_0 pypi zipp 3.11.0 py37h06a4308_0
zlib 1.2.12 h5eee18b_3 `

To Reproduce Run default example file. Code fails here:

text = "My name is Jonas E. Smith. Please turn to p. 55." nlp = spacy.blank('en') Traceback (most recent call last): File "", line 1, in File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/spacy/init.py", line 82, in blank return LangClass.from_config(config, vocab=vocab, meta=meta) File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/spacy/language.py", line 1773, in from_config nlp = lang_cls(vocab=vocab, create_tokenizer=create_tokenizer, meta=meta) File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/spacy/language.py", line 162, in init util.registry._entry_point_factories.get_all() File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/catalogue/init.py", line 119, in get_all result.update(self.get_entry_points()) File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/catalogue/init.py", line 134, in get_entry_points result[entry_point.name] = entry_point.load() File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/site-packages/catalogue/_importlib_metadata/init.py", line 94, in load module = import_module(match.group('module')) File "/home/vibhu/miniconda3/envs/huggfacegpu/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 962, in _find_and_load_unlocked ModuleNotFoundError: No module named 'pysbd.utils'; 'pysbd' is not a package

hiwaveSupport commented 1 year ago

Other example using the same environment breaks as well. seg = pysbd.Segmenter(language="en", clean=False) print(seg.segment(text)) Fails with error as well.

2023-02-11 11:16:54.798982: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-11 11:16:54.897833: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variableTF_ENABLE_ONEDNN_OPTS=0. 2023-02-11 11:16:54.901353: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:54.901365: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-02-11 11:16:54.918363: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-02-11 11:16:55.231398: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.231428: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.231431: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-02-11 11:16:55.704366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-02-11 11:16:55.704491: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704510: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704525: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704551: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704566: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704581: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704596: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704612: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 2023-02-11 11:16:55.704617: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... Traceback (most recent call last): File "/home/vibhu/src/talkAItive/sentence_splitting/pysbd.py", line 20, in <module> seg = pysbd.Segmenter(language="en", clean=False) AttributeError: module 'pysbd' has no attribute 'Segmenter' [Finished in 1.8s with exit code 1]

hiwaveSupport commented 1 year ago

Environment issues realated. Created a fresh env and this worked.