[Bug]: AttributeError: Can't pickle local object 'dataio_prep.<locals>.audio_pipeline'

hieuminh65 commented 1 year ago

Describe the bug

Hi I am working with the IEMOCAP in recipes for emotion recognition. When I started the training I got this error.

Expected behaviour

The training process is done and I got the new model weight without error

To Reproduce

if name == "main":

# Reading command line arguments.
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
# run_opts['device'] = 'cpu'
# Initialize ddp (useful only for multi-GPU DDP training).
if run_opts["device"] != "cpu":
    sb.utils.distributed.ddp_init_group(run_opts)

# Load hyperparameters file with command-line overrides.
with open(hparams_file) as fin:
    hparams = load_hyperpyyaml(fin, overrides)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    hyperparams_to_save=hparams_file,
    overrides=overrides,
)

from ravdess_prepare import prepare_data  # noqa E402

# Data preparation, to be run on only one process.
if not hparams["skip_prep"]:
    sb.utils.distributed.run_on_main(
        prepare_data,
        kwargs={
            "data_original": hparams["data_folder"],
            "save_json_train": hparams["train_annotation"],
            "save_json_valid": hparams["valid_annotation"],
            "save_json_test": hparams["test_annotation"],
            "split_ratio": hparams["split_ratio"],
            "seed": hparams["seed"],
        },
    )

# Create dataset objects "train", "valid", and "test".
datasets = dataio_prep(hparams)

device = torch.device("cpu")
# hparams["wav2vec2"] = hparams["wav2vec2"].to(device=run_opts["device"])
hparams["wav2vec2"] = hparams["wav2vec2"].to(device=device)
# freeze the feature extractor part when unfreezing
if not hparams["freeze_wav2vec2"] and hparams["freeze_wav2vec2_conv"]:
    hparams["wav2vec2"].model.feature_extractor._freeze_parameters()

# Initialize the Brain object to prepare for mask training.
emo_id_brain = EmoIdBrain(
    modules=hparams["modules"],
    opt_class=hparams["opt_class"],
    hparams=hparams,
    run_opts=run_opts,
    checkpointer=hparams["checkpointer"],
)

# The `fit()` method iterates the training loop, calling the methods
# necessary to update the parameters of the model. Since all objects
# with changing state are managed by the Checkpointer, training can be
# stopped at any point, and will be resumed on next call.
emo_id_brain.fit(
    epoch_counter=emo_id_brain.hparams.epoch_counter,
    train_set=datasets["train"],
    valid_set=datasets["valid"],
    train_loader_kwargs=hparams["dataloader_options"],
    valid_loader_kwargs=hparams["dataloader_options"],
)

# Load the best checkpoint for evaluation
test_stats = emo_id_brain.evaluate(
    test_set=datasets["test"],
    min_key="error_rate",
    test_loader_kwargs=hparams["dataloader_options"],
)

Versions

No response

Relevant log output

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/configuration_utils.py:380: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
  warnings.warn(
Some weights of the model checkpoint at facebook/wav2vec2-base were not used when initializing Wav2Vec2Model: ['quantizer.weight_proj.bias', 'project_q.bias', 'project_hid.weight', 'project_q.weight', 'quantizer.weight_proj.weight', 'project_hid.bias', 'quantizer.codevectors']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 feature extractor is frozen.
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/train_with_wav2vec2/1993
ravdess_prepare - Preparation completed in previous run, skipping.
speechbrain.dataio.encoder - Load called, but CategoricalEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 90.2M trainable parameters in EmoIdBrain
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1
  0%|                                                                                                    | 0/135 [00:00<?, ?it/s]
speechbrain.core - Exception:
Traceback (most recent call last):
  File "/Users/hieunguyenminh/CODE ALL/HuggingFace/SpeechBrain/speechbrain/recipes/IEMOCAP/emotion_recognition/train_with_wav2vec2.py", line 288, in <module>
    emo_id_brain.fit(
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/speechbrain/core.py", line 1264, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/speechbrain/core.py", line 1111, in _fit_train
    for batch in t:
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/speechbrain/dataio/dataloader.py", line 286, in __iter__
    iterator = super().__iter__()
               ^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 441, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1042, in __init__
    w.start()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'dataio_prep.<locals>.audio_pipeline'

Additional context

I change the device to cpu because I don't have CUDA system but I wonder if that is the reason.

backspacetg commented 1 year ago

This seems like a multiprocessing problem. If you are using a windows PC, you can try to set num_workers into 0 since The yaml file says that 0 works for windows.

hieuminh65 commented 1 year ago

This seems like a multiprocessing problem. If you are using a windows PC, you can try to set num_workers into 0 since The yaml file says that 0 works for windows.

Hey I use macOS, I try that but it does not work

lucadellalib commented 1 year ago

Can you try to add these lines at the beginning of your training script:

import multiprocessing
multiprocessing.set_start_method("fork")

Adel-Moumen commented 9 months ago

Hello @hieuminh65, any news on this issue please ?

matheus-rzende commented 8 months ago

Hello @hieuminh65, any news on this issue please ?

Hey, I had the same problem (MacOS) and the following worked :

import multiprocessing multiprocessing.set_start_method("fork")

Thanks @lucadellalib

zixiaosunbro commented 5 months ago

I use macOS M1, I add import multiprocessing multiprocessing.set_start_method("fork")

and show something like this: [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)

how can i fix it?

asumagic commented 5 months ago

I use macOS M1, I add import multiprocessing multiprocessing.set_start_method("fork")

and show something like this: [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)

how can i fix it?

Not sure, but this doesn't look like a SB-specific issue.

I can find this which seems related, and it doesn't take them the set_start_method to cause the issue: https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206

You could try disabling workers altogether in the mentioned way though I'm not sure this behaves well with SB (on top of slowing down training somewhat).

speechbrain / speechbrain