ufal / crac2022-corpipe

ÚFAL CorPipe: CRAC 2022 Winning System for Multilingual Coreference Resolution
Mozilla Public License 2.0
3 stars 2 forks source link

TypeError: unhashable type: "numpy.ndarray" on generated Dataset #1

Closed rodrigallardo closed 8 months ago

rodrigallardo commented 8 months ago

Hi!

First, thank you for making the code for your awesome model open-source.

I'm trying to reproduce your results by rerunning the corpipe.py script for the Spanish data (es_ancora). I have not made any changes to the code, yet I'm facing an error that doesn't allow me to run the whole training script.

The error occurs when running the line:

pipeline = pipeline.apply(tf.data.experimental.assert_cardinality(sum(1 for _ in pipeline)))

On the method pipeline() of the class Dataset. Apparently, the error rises from running sum(1 for _ in pipeline). The traceback is the following:

*** tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: unhashable type: 'numpy.ndarray'
Traceback (most recent call last):

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 269, in __call__
    return func(device, token, args)

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 147, in __call__
    outputs = self._call(device, args)

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 154, in _call
    ret = self._func(*args)

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
    return func(*args, **kwargs)

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/data/ops/structured_function.py", line 220, in py_function_wrapper
    ret = self._func(*nested_args)

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1053, in generator_next_fn
    flat_values = script_ops.numpy_function(generator_py_func,

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "/home/toti/crac2022-corpipe/.venv/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 822, in get_iterator
    return self._iterators[iterator_id]

TypeError: unhashable type: 'numpy.ndarray'

         [[{{node EagerPyFunc}}]] [Op:IteratorGetNext]

Could anyone provide any help for fixing this error?

Any suggestion is more than welcomed.

Thank you! Rodrigo

foxik commented 8 months ago

Hi Rodrigo,

first, note that we have also a newer model in https://github.com/ufal/crac2023-corpipe (the entry we had to the next year's competition, with improved performance), and for that we also released a trained checkpoint.

If you would like to make this older model work, please start by sending me the pip list of your environment -- to preproduce the problem, I will need the same versions of packages that you use.

Cheers!

rodrigallardo commented 8 months ago

Hi @foxik ! Thank you very much for the quick response.

first, note that we have also a newer model in https://github.com/ufal/crac2023-corpipe (the entry we had to the next year's competition, with improved performance), and for that we also released a trained checkpoint.

I'm trying to run 2022's version since I believe I lack the resources to train the 2023 version. However, I'm going to also give it a try :)

If you would like to make this older model work, please start by sending me the pip list of your environment -- to preproduce the problem, I will need the same versions of packages that you use.

Sure. Here it is:

Package                      Version
---------------------------- -------------------
absl-py                      2.1.0
asttokens                    2.4.1
astunparse                   1.6.3
cachetools                   5.3.3
certifi                      2024.2.2
charset-normalizer           3.3.2
click                        8.1.7
colorama                     0.4.6
decorator                    5.1.1
exceptiongroup               1.2.0
executing                    2.0.1
filelock                     3.13.1
flatbuffers                  24.3.6
fsspec                       2024.2.0
gast                         0.5.4
google-auth                  2.28.1
google-auth-oauthlib         0.4.6
google-pasta                 0.2.0
grpcio                       1.62.0
h5py                         3.10.0
huggingface-hub              0.21.4
idna                         3.6
importlib-metadata           7.0.1
iniconfig                    2.0.0
ipdb                         0.13.13
ipython                      8.18.1
jedi                         0.19.1
joblib                       1.3.2
keras                        2.8.0
Keras-Preprocessing          1.1.2
libclang                     16.0.6
Markdown                     3.5.2
MarkupSafe                   2.1.5
matplotlib-inline            0.1.6
numpy                        1.26.4
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.2
parso                        0.8.3
pexpect                      4.9.0
pip                          21.1.1
pluggy                       1.4.0
prompt-toolkit               3.0.43
protobuf                     3.20.3
ptyprocess                   0.7.0
pure-eval                    0.2.2
pyasn1                       0.5.1
pyasn1-modules               0.3.0
pygments                     2.17.2
pytest                       8.0.2
PyYAML                       6.0.1
regex                        2023.12.25
requests                     2.31.0
requests-oauthlib            1.3.1
rsa                          4.9
sacremoses                   0.1.1
scipy                        1.12.0
sentencepiece                0.2.0
setuptools                   56.0.0
six                          1.16.0
stack-data                   0.6.3
tensorboard                  2.8.0
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.8.0
tensorflow-addons            0.16.1
tensorflow-io-gcs-filesystem 0.36.0
termcolor                    2.4.0
tf-estimator-nightly         2.8.0.dev2021122109
tokenizers                   0.12.1
tomli                        2.0.1
tqdm                         4.66.2
traitlets                    5.14.1
transformers                 4.18.0
typeguard                    4.1.5
typing-extensions            4.10.0
udapi                        0.3.0
urllib3                      2.2.1
wcwidth                      0.2.13
werkzeug                     3.0.1
wheel                        0.42.0
wrapt                        1.16.0
zipp                         3.17.0

Many thanks!

foxik commented 8 months ago

Unfortunately, I cannot replicate your problem. What I did:

I also tried the resampling variant, i.e.

venv/bin/python corpipe.py data/es_ancora/es_ancora --resample 6000 1 --epochs=20 --lazy_adam --learning_rate_decay --crf --batch_size=8 --bert=google/rembert --learning_rate=1e-5 --segment=512 --right=50 --exp=es_ancora_test

and that also worked.

Could you try performing the above steps and report any problems encountered? Cheers!

rodrigallardo commented 8 months ago

With that setup, the error no longer happens! The only difference I can see is that I previously used Python 3.9.5 instead of 3.9.7, which is the version you used. After changing the Python version, it worked.

Thank you so much for the help! And again, congratulations for the fabulous work!

foxik commented 8 months ago

Glad that it is working now :+1: