ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.19k stars 1.19k forks source link

Ray parallelization does not work #3915

Open sergsb opened 9 months ago

sergsb commented 9 months ago

Describe the bug Does not work model parallelization with Ray and a custom model from huggingface.

To Reproduce I want to train a neural network using ludwig and molecular encoder from huggingface. My config is:

model_type: ecd
input_features:
  - name: Smiles
    type: text
    preprocessing: 
      tokenizer: molecules
    encoder: auto_transformer
    pretrained_model_name_or_path: ibm/MoLFormer-XL-both-10pct    
    trainable: false
output_features: 
  - name: Measured
    type: number
    decoder:
      num_fc_layers: 1
      output_size: 64
trainer: 
  epochs: 20 
  optimizer:
    type: adam
    beta1: 0.9  # Corrected 'beat1' to 'beta1'
  learning_rate: 0.001

It works perfectly with local backend, however when I try to run multi-gpu training with Ray, it fails


ModuleNotFoundError: No module named 'transformers_modules'
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) No module named 'transformers_modules'
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) Traceback (most recent call last):
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)   File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 404, in deserialize_objects
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)     obj = self._deserialize_object(data, metadata, object_ref)
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)   File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 270, in _deserialize_object
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)     return self._deserialize_msgpack_data(data, metadata_fields)
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)   File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 225, in _deserialize_msgpack_data
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)   File "/home/sergeys/miniconda3/lib/python3.11/site-packages/ray/_private/serialization.py", line 215, in _deserialize_pickle5_data
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)     obj = pickle.loads(in_band)
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622)           ^^^^^^^^^^^^^^^^^^^^^
(dask:('map-3ad119a87f1f9eca9ea3cfc5d1963787', 0) pid=790622) ModuleNotFoundError: No module named 'transformers_modules'
arnavgarg1 commented 9 months ago

Hi @sergsb - Is this still an issue you're running into? Are you able to share the full stack trace?