Open sergsb opened 9 months ago
Describe the bug Does not work model parallelization with Ray and a custom model from huggingface.
To Reproduce I want to train a neural network using ludwig and molecular encoder from huggingface. My config is:
model_type: ecd input_features: - name: Smiles type: text preprocessing: tokenizer: molecules encoder: auto_transformer pretrained_model_name_or_path: ibm/MoLFormer-XL-both-10pct trainable: false output_features: - name: Measured type: number decoder: num_fc_layers: 1 output_size: 64 trainer: epochs: 20 optimizer: type: adam beta1: 0.9 # Corrected 'beat1' to 'beta1' learning_rate: 0.001
It works perfectly with local backend, however when I try to run multi-gpu training with Ray, it fails
ModuleNotFoundError: No module named 'transformers_modules' (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) No module named 'transformers_modules' (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) Traceback (most recent call last): (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 404, in deserialize_objects (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) obj = self._deserialize_object(data, metadata, object_ref) (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 270, in _deserialize_object (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) return self._deserialize_msgpack_data(data, metadata_fields) (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 225, in _deserialize_msgpack_data (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) python_objects = self._deserialize_pickle5_data(pickle5_data) (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 215, in _deserialize_pickle5_data (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) obj = pickle.loads(in_band) (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ^^^^^^^^^^^^^^^^^^^^^ (dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ModuleNotFoundError: No module named 'transformers_modules'
Hi @sergsb - Is this still an issue you're running into? Are you able to share the full stack trace?
Describe the bug Does not work model parallelization with Ray and a custom model from huggingface.
To Reproduce I want to train a neural network using ludwig and molecular encoder from huggingface. My config is:
It works perfectly with local backend, however when I try to run multi-gpu training with Ray, it fails