Open uahmed93 opened 6 days ago
This script is breaking because it is unable to serialize ct2 model while loading on worker. I have run this after initializing NDC gpu dask cluster on slurm but this is not working here.
@uahmed93 , Can you post the error you saw here please.
Here:
Deployed LocalCUDACluster(22f3d4fd, 'tcp://127.0.0.1:37693', workers=1, threads=1, memory=1.79 TiB)...
2024-09-11 10:56:33,005 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 4 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x15522cf81900>
0. read-parquet-2f062a30cf4676cd0b9a4ab5cf06fe85
1. repartition-2-aa356a5cb4dedecfbdd625f941566718
2. to-parquet-03d4d012cafd9ed3bdb4ee7374761cf1
3. store-to-parquet-03d4d012cafd9ed3bdb4ee7374761cf1
>.
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
TypeError: cannot pickle 'ctranslate2._ext.Translator' object
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object 'to_parquet.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 353, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 76, in pickle_dumps
frames[0] = pickle.dumps(
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
TypeError: cannot pickle 'ctranslate2._ext.Translator' object
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/lustre/fsw/portfolios/llmservice/users/uahmed/ctransl/ctransl_cf.py", line 145, in <module>
main()
File "/lustre/fsw/portfolios/llmservice/users/uahmed/ctransl/ctransl_cf.py", line 141, in main
outputs.to_parquet(args.output_parquet_path)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/dask_cudf/core.py", line 264, in to_parquet
return to_parquet(self, path, *args, **kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/dask/dataframe/io/parquet/core.py", line 1047, in to_parquet
out = out.compute(**compute_kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/dask/base.py", line 379, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/dask/base.py", line 665, in compute
results = schedule(dsk, keys, **kwargs)
File "/lustre/fsw/portfolios/llmservice/users/uahmed/test_env/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 379, in serialize
raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 4 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x15522cf81900>\n 0. read-parquet-2f062a30cf4676cd0b9a4ab5cf06fe85\n 1. repartition-2-aa356a5cb4dedecfbdd625f941566718\n 2. to-parquet-03d4d012cafd9ed3bdb4ee7374761cf1\n 3. store-to-parquet-03d4d012cafd9ed3bdb4ee7374761cf1\n>')
Also , CC: @sarahyurick for suggestions .
Added a ctransalte2 example which works on string tokens instead of integer tokens.
To run the example :