kunaldahiya / pyxclib

Tools for multi-label classification problems.
MIT License
126 stars 36 forks source link

Methods retain_topk, rank does not work #34

Closed shikharmn closed 8 months ago

shikharmn commented 1 year ago

Since numpy has deprecated the dtype np.int, the rank method and any methods that invoke that have been broken. This line uses np.int, which is at the root of this issue.

A basic fix seems to be to just change that to 'int', which numpy recommends, along with multiple other changes.

How to proceed? Am willing to take this up.

Error:

Traceback (most recent call last):
  File "path/projects/OrganicBERT/run_eval.py", line 61, in <module>
    meta_preds = retain_topk(sp.load_npz(f"{DUMP_DIR}/../preds_mat.npz"), k=1)
  File "path/miniconda3/envs/ogb/lib/python3.9/site-packages/xclib/utils/sparse.py", line 137, in retain_topk
    ranks = rank(X)
  File "path/miniconda3/envs/ogb/lib/python3.9/site-packages/xclib/utils/sparse.py", line 36, in rank
    ranks = _rank(X.data, X.indices, X.indptr)
  File "xclib/utils/_sparse.pyx", line 197, in xclib.utils._sparse._rank
  File "path/miniconda3/envs/ogb/lib/python3.9/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'

pip freeze:


colorama==0.4.6
coloredlogs==15.0.1
Cython==3.0.0
fbgemm-gpu==0.4.1
grpcio==1.57.0
hnswlib==0.7.0
huggingface-hub==0.16.4
humanfriendly==10.0
hydra-core==1.3.2
knack==0.10.1
lightning-utilities==0.9.0
numba==0.57.1
numpy==1.24.0
nvidia-ml-py==12.535.77
nvitop==1.2.0
oauthlib==3.2.2
omegaconf==2.3.0
onnxruntime==1.14.0
onnxruntime-gpu==1.14.0
pandas==2.0.3
pathspec==0.11.2
pathtools==0.1.2
Pillow==10.0.0
pkginfo==1.9.6
sacremoses==0.0.53
safetensors==0.3.2
scikit-learn==1.3.0
scipy==1.11.1
sentence-transformers==2.2.2
sentencepiece==0.1.99
sentry-sdk==1.29.2
setproctitle==1.3.2
tokenizers==0.12.1
torch==2.0.1
torchaudio==2.0.2
torchmetrics==1.0.3
xclib @ git+https://github.com/kunaldahiya/pyxclib.git@ae5410f10080742758cdd533f768e3fe5b4f4de3
kunaldahiya commented 1 year ago

Thanks Shikhar. The code is in cython - so you'll need to change *_t (on the left hand) as well. I think it would be good to determine datatype of indices, indptr and use that.

Please raise a PR with the fixes

kunaldahiya commented 11 months ago

@shikharmn any update?

shikharmn commented 9 months ago

Hi @kunaldahiya, I was occupied with submissions but now have pushed PR #37 to fix this issue. The methods work with use_cython=True as well now. I've included a small script in the PR text I used to verify correctness as well. Let me know if any changes are needed.

Tangentially in a separate issue, we can add tests to this library as well, making it easier to maintain and contribute to.

anshumitts commented 8 months ago

Approved the PR