tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
https://tunib-ai.github.io/parallelformers
Apache License 2.0
776 stars 61 forks source link

Error using google/UL2 model #29

Closed dnhkng closed 2 years ago

dnhkng commented 2 years ago

The model: google/ul2

The Hardware: 2x RTX Titan AMD Ryzen 9 5900X 12-Core Processor 64Gb RAM

The Environment: Python 3.9.13 Pytorch 1.12.0+cu102 NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

Code used:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from parallelformers import parallelize
import torch

tokenizer = AutoTokenizer.from_pretrained("google/ul2")
model = AutoModelForSeq2SeqLM.from_pretrained("google/ul2")

parallelize(model, num_gpus=2, fp16=True, verbose='detail')

input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"

inputs = tokenizer(input_string, return_tensors="pt")

outputs = model.generate(**inputs, max_length=200)

print(tokenizer.decode(outputs[0]))

Error Message:

$ python test.py 
/home/******/miniconda3/envs/ul2/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 16 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Bus error (core dumped)

Is this something I can fix? I would love to use this large model, as it's near SOTA on everything :)

hyunwoongko commented 2 years ago

@dnhkng Sorry for delay. Did you solve this problem?

dnhkng commented 2 years ago

I gave up, as I found that the model was trained on TPU (brainfloat16), which is not supported on the RTX Titan.

Inferencing would just lead to garbage output, due to precision differences.

hyunwoongko commented 2 years ago

That sounds bad. I am sorry for not being able to help.

tahercoolguy commented 2 years ago

getting same error Bus error (core dumped)

/usr/local/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 16 leaked semaphores to clean up at shutdown len(cache))

deepankar27 commented 1 year ago

Hello @tahercoolguy Did you able to figure out the issue? I am also getting same exception while running it in docker.

deepankar27 commented 1 year ago

@hyunwoongko any pointer to fix this issue?