huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.56k stars 26.18k forks source link

might be a waste of resources #31745

Open SDaoer opened 2 months ago

SDaoer commented 2 months ago
        while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
            # prepare model inputs
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)

            # forward pass to get next token
            outputs = self(
                **model_inputs,
                return_dict=True,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
            )

            if synced_gpus and this_peer_finished:
                continue  # don't waste resources running the code we don't need

            ...

Why is this condition checked after the outputs is generated? Can this be considered a form of resource wastage? Could this part be moved to the beginning of the while loop?

if synced_gpus and this_peer_finished:
    continue  # don't waste resources running the code we don't need

The code comes from transformers/generation/utils.py: GenerationMixin._sample

amyeroberts commented 2 months ago

cc @gante @zucchini-nlp

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante commented 1 month ago

Hi @SDaoer 👋 Thank you for opening this issue 🤗

TL;DR without it the code will hang in specific settings.

The answer is documented in the code: https://github.com/huggingface/transformers/blob/083e13b7c47f674b11c74d1b7c7ee7cd1241b406/src/transformers/generation/utils.py#L2198

You can complement this comment with the meaning of synced_gpus:

synced_gpus (`bool`, *optional*):
                Whether to continue running the while loop until max_length. Unless overridden this flag will be set to
                `True` under DeepSpeed ZeRO Stage 3 multiple GPUs environment to avoid hanging if one GPU finished
                generating before other GPUs. Otherwise it'll be set to `False`.
github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.