koaning / bulk

A Simple Bulk Labelling Tool
MIT License
552 stars 47 forks source link

segmentation fault #59

Closed micklynch closed 1 year ago

micklynch commented 1 year ago

Awesome project, I've been meaning to check it out for a while.

I ran into this error when running python prep-data.py and wondered if anyone else encountered this issue.

I reduced the dataset to ~200 sentences in case it was a memory issue.

[1]    32890 segmentation fault  python prep-data.py
/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I'm using Poetry for package mgmt.

[tool.poetry.dependencies]
python = "^3.11"
embetter = "^0.3.8"
pandas = "2.0.0"
umap-learn = "^0.5.3"

Any tips or advice greatly appreciated.

micklynch commented 1 year ago

I think I've narrowed the issue down to UMAP.

Also, in case it matters, I'm on an Intel Mac. Thanks!

micklynch commented 1 year ago

I believe I found the solution by searching through UMAP's issue log. For those interested:

Setting the environmental variable NUMBA_DISABLE_JIT to 1 prevents the segfault.

koaning commented 1 year ago

Ah yeah, sorry for the late reply, UMAP + numba issues really do pop up now and again.

Qiuzhuang commented 1 week ago

I believe I found the solution by searching through UMAP's issue log. For those interested:

Setting the environmental variable NUMBA_DISABLE_JIT to 1 prevents the segfault.

Disabling numba jit would make UMAP so slowly… However.