Open rsamf opened 3 weeks ago
Just making sure, have you tried this code with a repo that doesn't have a .
in its name?
Hi @rsamf
I couldn't reproduce the error on colab using your reproducer code.
can you confirm that your transformers version is >=4.39.1 ?
if yes try running the command transformers-cli env
and past your environment version here.
Just making sure, have you tried this code with a repo that doesn't have a
.
in its name?
Yes I have. Sorry for not mentioning that.
Hi @rsamf
I couldn't reproduce the error on colab using your reproducer code.
can you confirm that your transformers version is >=4.39.1 ?
if yes try running the command
transformers-cli env
and past your environment version here.
transformers version: 4.44.2
You should be able to see it in my post under System Info
I can reproduce the error when I run that script! However, I'm not sure if our models are intended to be pickle-safe - diagnosing issues that involve both the import machinery and multiprocessing will likely be very annoying, so we probably can't prioritize this one! I'd accept a PR if anyone can figure it out, though.
Thanks @Rocketknight1 and @not-lain for the quick responses.
Just to put a little bit more context, I am trying to maximize the utilization of my GPU by parallelizing just the preprocess step in pipelines. Some of the pipelines such as RMBG-1.4 send their resulting tensors to cuda requiring me to use the "spawn" method. Note that my snippet doesn't use cuda because that part is not the issue. However, in my experience most pipelines don't use cuda in the preprocess step which allows me to use "fork" which is less error prone and doesn't cause the ModuleNotFoundError
. Also, with spawn, I have to annoyingly delete the cuda tensors from the producer process and return cloned cpu ones.
I understand that my problem is likely rare compared to the rest of the community, and even I am starting to think of stopping my investigation of the issue because of other unforeseen issues with the "spawn" start method. However, I will leave a more relatable snippet below, just in case:
Does not work:
from transformers import pipeline
from datasets import load_dataset
import torch.multiprocessing as mp
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import Dataset
class Preprocess(Dataset):
def __init__(self, dataset, preprocess_fn):
self.dataset = dataset
self.preprocess_fn = preprocess_fn
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
print("Script should fail before reaching this point")
return self.preprocess_fn(self.dataset[idx]["image"])
if __name__ == "__main__":
mp.set_start_method("spawn", force=True)
pipe = pipeline(model="briaai/RMBG-1.4", device="cpu", trust_remote_code=True)
dataset = load_dataset("microsoft/cats_vs_dogs", split="train")
dataset = Preprocess(dataset, pipe.preprocess)
dataloader = DataLoader(dataset, 4, num_workers=2)
for batch in dataloader:
break
This one works:
from transformers import pipeline
from datasets import load_dataset
import torch.multiprocessing as mp
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import Dataset
class Preprocess(Dataset):
def __init__(self, dataset, preprocess_fn):
self.dataset = dataset
self.preprocess_fn = preprocess_fn
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
print("Works!")
return self.preprocess_fn(self.dataset[idx]["image"])
if __name__ == "__main__":
mp.set_start_method("spawn", force=True)
pipe = pipeline(model="microsoft/resnet-50", device="cpu") # changed to an officially supported architecture
dataset = load_dataset("microsoft/cats_vs_dogs", split="train")
dataset = Preprocess(dataset, pipe.preprocess)
dataloader = DataLoader(dataset, 4, num_workers=2)
for batch in dataloader:
break
System Info
transformers
version: 4.44.2Who can help?
@Rocketknight1 @not-lain
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am trying to use a model with custom architecture that has a '.' in its name: "briaai/RMBG-1.4". Historically, there was issues with this and it got resolved (see #29251). Now, I'm doing something more niche that requires me to pickle the model and send it to another process, that is started with the "spawn" method with
set_start_method("spawn")
.See the following minimal reproducible snippet:
The same error from #29251 shows up:
However, when the spawn method is "fork", it works fine.
It appears that processes started with the "spawn" method can't find the custom huggingface modules with '.' still, but the processes with "fork" are still good.
I understand that I don't need to pickle the model and send it to other processes by keeping it top-level, but it would be convenient for my project if I could.
Expected behavior
For the module transformers_modules.briaai.RMBG-1.4 to be found.