The output is so bad - total garbage what I am doing wrong? It is also super slow and requires huge amount of RAM

FurkanGozukara commented 1 year ago

Here my entire command

from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-1.3b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-1.3b", device_map="auto")

input_text = "The benefits of deadlifting\n\n"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids,new_doc=False,top_p=0.7, max_length=1000)
print(tokenizer.decode(outputs[0]))

And the output is total repetition and garbage. I am trying to generate an article based on the topic sentence I provide

Also even 28 GB VRAM is not enough for 6.7b model. I am testing CPU runtime on IPU and it has been more than 2 hours with just 6.7b model.

the output as below

The benefits of deadlifting

The benefits of deadlifting are numerous. It is a simple, inexpensive, and effective method of reducing the risk of injury to the shoulder and elbow. It is also a simple and effective method of reducing the risk of injury to the hand.

Shoulder

The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper extremity. The shoulder is the most common site of injury to the upper

FurkanGozukara commented 1 year ago

Here another example. Why all repetition?

RJT1990 commented 1 year ago

Thanks for reporting, will look into this and get back to you tomorrow

AbstractQbit commented 1 year ago

In half precision mode 6.7b fits on a 3090. As for the output quality, you need to tweak generation parameters a little, this blogpost explains quite a bit.

Here's a snippet of how I use it:

import torch, gc
from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 2020

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

input_text = """# Scientific article.
title: Contrastive analysis of models used for DRAM simulation.

# Introduction
"""
input_ids = tokenizer(input_text, return_tensors="pt", padding='max_length').input_ids.to("cuda")

outputs = model.generate(input_ids,
                        max_new_tokens=1000,
                        do_sample=True,
                        temperature=0.7,
                        top_k=25,
                        top_p=0.9,
                        no_repeat_ngram_size=10,
                        early_stopping=True)

print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

gc.collect()
torch.cuda.empty_cache()

padding='max_length' means new_doc=True, set to False to disable

FurkanGozukara commented 1 year ago

@AbstractQbit ty very much for answer

May I ask something regarding this format

# Scientific article.
title: Contrastive analysis of models used for DRAM simulation.

# Introduction

So the text generator understand # character as a special character and do something?

what does these 2 parameter do? pad_token_id padding_side

AbstractQbit commented 1 year ago

Authors say in the paper that the model was trained on text in markdown format, so giving markdown-ish prompts to the model should probably work best, I guess.

Padding params I took from here https://github.com/paperswithcode/galai/blob/f6d9b0a5b35a0eda53597a5ea7d51963bfc05de1/galai/model.py#L85-L86

FurkanGozukara commented 1 year ago

@AbstractQbit ty so much for answers

about these hyper parameters, have you tested them or how did you come up with those values?

do_sample=True,
                    temperature=0.7,
                    top_k=25,
                    top_p=0.9,
                    no_repeat_ngram_size=10,
                    early_stopping=True

AbstractQbit commented 1 year ago

@FurkanGozukara Those are just what I've ended up with after playing around with the model for a bit. There was no real methodology for picking those. They just produced somewhat sensible output, so I've shared them here as a starting point for you. There are no one-size-fits-all parameters, you'll have to experiment yourself to tailor them to your needs.

As to what they do, please refer to the article I've linked above. I'm not an NLP expert, so I can't explain them any better than HF people.

gboeer commented 1 year ago

@FurkanGozukara I played around with your prompt, and this is what the model came up with.

Title: The benefits of deadlifting

Abstract: The purpose of this study was to determine the effect of deadlifting on the cardiovascular system. 
The study consisted of a group of 13 men and 11 women who were randomly assigned to an experimental group (n = 24) or a control group (n = 24,). 
Subjects in the experimental group performed deadlifting exercises 2 days per week for 6 weeks, while subjects in the control group did not participate in any exercise program. 
The 6-week program consisted of a 3-week progressive phase and a 3-week maintenance phase. 
At the end of each phase, a graded exercise test (GXT) was performed on a treadmill to determine peak oxygen consumption (VO2peak), ventilatory threshold (VT), and heart rate (HR) at the VT. 
At the end of the 6-week program, VO2peak increased by 15.4% (P < 0.05) in the experimental group compared with a 0.9% increase (P > 0.05) in the control group. 
The experimental group demonstrated a 13.9% increase (P < 0.01) in HR at the VT compared with a 2.4% increase (P > 0...</s>

I followed the parameters from @AbstractQbit. Here is the complete code. I could run it on one Titan RTX (24GB VRAM) but only when using half precision.

from transformers import AutoTokenizer, OPTForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 4020

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

#input_text = "The Transformer architecture [START_REF]"
input_text = "Title: The benefits of deadlifting\n\n"

input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=1000,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)

print(tokenizer.decode(outputs[0]).lstrip('<pad>'))

iaposto commented 1 year ago

@AbstractQbit your answer has hleped me a lot, many thanks!

Do you (or anyone else) now how to use the new_doc (padding) parameter to continue a document generation after the first prompt? Do I have to use the output of the prompt as input to the next one or it's better to use larger max_new_tokens value?

phineas-pta commented 1 year ago

i'm getting CUDA error with @Legor and @AbstractQbit ' code

details bug report when setting CUDA_LAUNCH_BLOCKING=1

/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [57,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1660087551192/work/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [10], line 1
----> 1 outputs_ids = model.generate(
      2     input_ids, max_new_tokens=1000,
      3     do_sample=True, temperature=0.7,
      4     top_k=25, top_p=0.9,
      5     no_repeat_ngram_size=10,
      6     early_stopping=True
      7 )

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__..decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/generation_utils.py:1543, in GenerationMixin.generate(self, inputs, max_length, min_length, do_sample, early_stopping, num_beams, temperature, penalty_alpha, top_k, top_p, typical_p, repetition_penalty, bad_words_ids, force_words_ids, bos_token_id, pad_token_id, eos_token_id, length_penalty, no_repeat_ngram_size, encoder_no_repeat_ngram_size, num_return_sequences, max_time, max_new_tokens, decoder_start_token_id, use_cache, num_beam_groups, diversity_penalty, prefix_allowed_tokens_fn, logits_processor, renormalize_logits, stopping_criteria, constraints, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, forced_bos_token_id, forced_eos_token_id, remove_invalid_values, synced_gpus, exponential_decay_length_penalty, suppress_tokens, begin_suppress_tokens, forced_decoder_ids, **model_kwargs)
   1535     input_ids, model_kwargs = self._expand_inputs_for_generation(
   1536         input_ids,
   1537         expand_size=num_return_sequences,
   1538         is_encoder_decoder=self.config.is_encoder_decoder,
   1539         **model_kwargs,
   1540     )
   1542     # 12. run sample
-> 1543     return self.sample(
   1544         input_ids,
   1545         logits_processor=logits_processor,
   1546         logits_warper=logits_warper,
   1547         stopping_criteria=stopping_criteria,
   1548         pad_token_id=pad_token_id,
   1549         eos_token_id=eos_token_id,
   1550         output_scores=output_scores,
   1551         return_dict_in_generate=return_dict_in_generate,
   1552         synced_gpus=synced_gpus,
   1553         **model_kwargs,
   1554     )
   1556 elif is_beam_gen_mode:
   1557     if num_return_sequences > num_beams:

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/generation_utils.py:2482, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
   2479 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
   2481 # forward pass to get next token
-> 2482 outputs = self(
   2483     **model_inputs,
   2484     return_dict=True,
   2485     output_attentions=output_attentions,
   2486     output_hidden_states=output_hidden_states,
   2487 )
   2489 if synced_gpus and this_peer_finished:
   2490     continue  # don't waste resources running the code we don't need

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
    154         output = old_forward(*args, **kwargs)
    155 else:
--> 156     output = old_forward(*args, **kwargs)
    157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py:929, in OPTForCausalLM.forward(self, input_ids, attention_mask, head_mask, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
    926 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    928 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
--> 929 outputs = self.model.decoder(
    930     input_ids=input_ids,
    931     attention_mask=attention_mask,
    932     head_mask=head_mask,
    933     past_key_values=past_key_values,
    934     inputs_embeds=inputs_embeds,
    935     use_cache=use_cache,
    936     output_attentions=output_attentions,
    937     output_hidden_states=output_hidden_states,
    938     return_dict=return_dict,
    939 )
    941 logits = self.lm_head(outputs[0]).contiguous()
    943 loss = None

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
    154         output = old_forward(*args, **kwargs)
    155 else:
--> 156     output = old_forward(*args, **kwargs)
    157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py:628, in OPTDecoder.forward(self, input_ids, attention_mask, head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
    625 past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
    627 if inputs_embeds is None:
--> 628     inputs_embeds = self.embed_tokens(input_ids)
    630 # embed positions
    631 if attention_mask is None:

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ███/myconda/condaGA/lib/python3.10/site-packages/accelerate/hooks.py:156, in add_hook_to_module..new_forward(*args, **kwargs)
    154         output = old_forward(*args, **kwargs)
    155 else:
--> 156     output = old_forward(*args, **kwargs)
    157 return module._hf_hook.post_forward(module, output)

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/modules/sparse.py:158, in Embedding.forward(self, input)
    157 def forward(self, input: Tensor) -> Tensor:
--> 158     return F.embedding(
    159         input, self.weight, self.padding_idx, self.max_norm,
    160         self.norm_type, self.scale_grad_by_freq, self.sparse)

File ███/myconda/condaGA/lib/python3.10/site-packages/torch/nn/functional.py:2199, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2193     # Note [embedding_renorm set_grad_enabled]
   2194     # XXX: equivalent to
   2195     # with torch.no_grad():
   2196     #   torch.embedding_renorm_
   2197     # remove once script supports set_grad_enabled
   2198     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2199 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: CUDA error: device-side assert triggered

FurkanGozukara commented 1 year ago

@Legor ty so much. I wonder they release a model without any proper example as yours.

FurkanGozukara commented 1 year ago

@AbstractQbit your answer has hleped me a lot, many thanks!

Do you (or anyone else) now how to use the new_doc (padding) parameter to continue a document generation after the first prompt? Do I have to use the output of the prompt as input to the next one or it's better to use larger max_new_tokens value?

unfortunately hugging face doesnt support newdoc i dont know why. your other questions i also wonder

paperswithcode / galai

The output is so bad - total garbage what I am doing wrong? It is also super slow and requires huge amount of RAM #32

Shoulder