huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.07k stars 26.81k forks source link

Multiprocessing support #32864

Closed keyboardAnt closed 2 weeks ago

keyboardAnt commented 2 months ago

Running model forwards within a process seems to get stuck. I tried to set TOKENIZERS_PARALLELISM to true and false but unfortunately both couldn't help 🥲

System Info

transformers-cli env:

- `transformers` version: 4.44.0
- Platform: Linux-6.10.0-linuxkit-aarch64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.4
- Accelerate version: 0.31.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.4.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: yes

Who can help?

@ArthurZucker @gante

Information

Tasks

Reproduction

Minimal example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from multiprocess import Process, Queue

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tok_ids = tokenizer.encode("Multiprocessing with Hugging Face could be an ", return_tensors="pt")

def fwd(model, tok_ids, queue):
    print("Starting process")
    print(f"{os.environ['TOKENIZERS_PARALLELISM']=}")
    print(f"{type(model)=}")
    print(f"{tok_ids=}")
    try:
        outs = model(tok_ids)
    except Exception as e:
        print(f"Error: {e}")
    print(f"{outs=}")
    queue.put(outs)

queue = Queue()
pr = Process(target=fwd, args=(model, tok_ids, queue))
pr.start()
pr.join()
outs = queue.get()
print(outs)

prints

Starting process
os.environ['TOKENIZERS_PARALLELISM']='false'
type(model)=<class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>
tok_ids=tensor([[15205,   541,   305,   919,   278,   351, 12905,  2667, 15399,   714,
           307,   281,   220]])

Expected behavior

Shouldn't get stuck.

keyboardAnt commented 2 months ago

I was able to reproduce the issue even without tokenizing:

import torch

from transformers import AutoModelForCausalLM
from multiprocess import Process, Queue

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

model = AutoModelForCausalLM.from_pretrained("gpt2")
tok_ids = torch.tensor([[15205,   541,   305,   919,   278,   351, 12905,  2667, 15399,   714, 307,   281,   220]])

def fwd(model, tok_ids, queue):
    print("Starting process")
    print(f"{os.environ['TOKENIZERS_PARALLELISM']=}")
    print(f"{type(model)=}")
    print(f"{tok_ids=}")
    try:
        outs = model(tok_ids)
    except Exception as e:
        print(f"Error: {e}")
    print(f"{outs=}")
    queue.put(outs)

queue = Queue()
pr = Process(target=fwd, args=(model, tok_ids, queue))
pr.start()
pr.join()
outs = queue.get()
print(outs)
gante commented 2 months ago

Hi @keyboardAnt 👋 Thank you for opening this issue 🤗

This is a torch-level issue, nothing we can do :) See https://pytorch.org/docs/master/notes/multiprocessing.html

(P.S.: in case you haven't considered it, have a look at input batching. If you don't know what batching is, check our course)

keyboardAnt commented 2 months ago

Thanks for the prompt reply @gante!

Are there any official examples of using transformers with torch.multiprocessing? I'm working on something for which batching isn't beneficial, and simply substituting multiprocessing with torch.multiprocessing didn't resolve the issue:

import torch

from transformers import AutoModelForCausalLM
from torch.multiprocessing import Process, Queue

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

model = AutoModelForCausalLM.from_pretrained("gpt2")
tok_ids = torch.tensor([[15205,   541,   305,   919,   278,   351, 12905,  2667, 15399,   714, 307,   281,   220]])

def fwd(model, tok_ids, queue):
    print("Starting process")
    print(f"{os.environ['TOKENIZERS_PARALLELISM']=}")
    print(f"{type(model)=}")
    print(f"{tok_ids=}")
    try:
        outs = model(tok_ids)
    except Exception as e:
        print(f"Error: {e}")
    print(f"{outs=}")
    queue.put(outs)

queue = Queue()
pr = Process(target=fwd, args=(model, tok_ids, queue))
pr.start()
pr.join()
outs = queue.get()
print(outs)
gante commented 2 months ago

Are there any official examples of using transformers with torch.multiprocessing?

Not that I know of :(

keyboardAnt commented 1 month ago

Hi @gante,

Thank you for your suggestions and guidance. To clarify, our project requires the ability to preempt (terminate) a forward pass of the transformers models during execution to free up GPU resources when needed. We considered using multiprocessing because running the model in a separate process seemed to offer a straightforward way to kill the process if necessary, thus terminating the model's execution and freeing the GPU.

However, as observed, the transformers model processes tend to get stuck when using multiprocessing. If there are alternative approaches to achieve this preemptive functionality without relying on multiprocessing, we would be very keen to explore those. Could you please provide any insights or guidance on how we might implement such functionality within the Transformers library or with PyTorch?

Your expertise and advice would be greatly appreciated as we navigate this challenge.

Thank you!

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante commented 1 week ago

@keyboardAnt I have no good alternative suggestions, unfortunately :(