rapidsai / crossfit

Metric calculation library
Apache License 2.0
2 stars 5 forks source link

crossfit no longer compatible with `nvcr.io/nvidia/pytorch:23.11-py3`. #37

Open edknv opened 9 months ago

edknv commented 9 months ago

For example, beir_report.py works with nvcr.io/nvidia/pytorch:23.09-py3 and nvcr.io/nvidia/pytorch:23.10-py3 but it no longer works with nvcr.io/nvidia/pytorch:23.11-py3.

The error message with nvcr.io/nvidia/pytorch:23.11-py3.

2023-12-08 01:25:45,495 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 4 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f2c902a19f0>
 0. read-parquet-a4e9d892750bb1499f10a2b9f98520c2
 1. repartition-2-70abc3715803fad94c0fca8e779ab753
 2. to-parquet-0190cffb61c5a5875f54ece4c9c1999c
 3. store-to-parquet-0190cffb61c5a5875f54ece4c9c1999c
>.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction ParquetFragmentScanOptions._reconstruct at 0x7f2d3022e9b0>: it's not the same object as pyarrow._dataset_parquet.ParquetFragmentScanOptions._reconstruct
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 352, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 75, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
_pickle.PicklingError: Can't pickle <cyfunction ParquetFragmentScanOptions._reconstruct at 0x7f2d3022e9b0>: it's not the same object as pyarrow._dataset_parquet.ParquetFragmentScanOptions._reconstruct

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/beir_report.py", line 48, in <module>
    main()
  File "/workspace/beir_report.py", line 35, in main
    report = cf.beir_report(
  File "/workspace/crossfit/report/beir/report.py", line 185, in beir_report
    embeddings: EmbeddingDatataset = embed(
  File "/workspace/crossfit/report/beir/embed.py", line 78, in embed
    embeddings.to_parquet(os.path.join(emb_dir, dtype))
  File "/usr/local/lib/python3.10/dist-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask_cudf/core.py", line 252, in to_parquet
    return to_parquet(self, path, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/io/parquet/core.py", line 1062, in to_parquet
    out = out.compute(**compute_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 374, in serialize
    raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 4 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f2c902a19f0>\n 0. read-parquet-a4e9d892750bb1499f10a2b9f98520c2\n 1. repartition-2-70abc3715803fad94c0fca8e779ab753\n 2. to-parquet-0190cffb61c5a5875f54ece4c9c1999c\n 3. store-to-parquet-0190cffb61c5a5875f54ece4c9c1999c\n>')
edknv commented 8 months ago

Getting the same error with nvcr.io/nvidia/pytorch:23.12-py3