Closed kj3moraes closed 7 months ago
Hello @kj3moraes ,
Thank you for raising this. Could you please share a reproducible example (i.e. with a prompt and file you could share publicly) and turning the sampling off for the generation?
@kj3moraes just had a look and I believe I have found the issue. The default maximum length for the text generation pipeline is 56 tokens (this includes the input prompt). Your input prompt most likely exceeds the 56 token default of the pipeline, and you have 2 solutions:
max_length
value of the TextGenerationConfig
(you are currently using the default constructor). This is probably what you want to to.Please let me know how this works. I will add a check to improve error handling when users provide inputs longer than the maximum length, thank you for raising this.
Thank you for raising this. Could you please share a reproducible example (i.e. with a prompt and file you could share publicly) and turning the sampling off for the generation?
This is the prompt that I use
Task: Extract summary and keywords from code
You are given a file containing code in a programming language. Your task is to reads the code from the file and generates a JSON output with two keys - 'summary' and 'keywords'.
1. 'summary': A string describing what the code is doing. This summary should capture the main purpose or functionality of the code in a concise manner.
2. 'keywords': A list of strings that includes relevant keywords related to the programming language used, the task being performed, or any significant terms present in the code.
The following is an example:
INPUT:
from transformers import pipeline
image_to_text_model = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
def convert_image_to_text(image_path: str) -> str :
return image_to_text_model(image_path)
{
"summary": "Function to caption the image at the specified path and returns it as a string",
"keywords": ["python", "huggingface", "transformers", "image", "caption", "captioning"]
}
INPUT:
An example file would be
use std::path::PathBuf;
pub fn path_to_string(path: &PathBuf) -> String {
path.display().to_string()
}
pub fn path_to_filename_string(path: &PathBuf) -> Option<String> {
Some(path.file_name()?.to_str()?.to_string())
}
* Change the `max_length` value of the [`TextGenerationConfig`](https://github.com/guillaume-be/rust-bert/blob/9f2cd17e914dee9570e981c63a4021beb33250c2/src/pipelines/text_generation.rs#L59) (you are currently using the default constructor). This is probably what you want to to.
This worked thanks a lot. Returning a Result would be better for sure.
I have been facing an error when prompting TextGeneration models. The whole trace is below
This occurs in this line in my code
I append the file contents onto a prompt that is no more than ~200 tokens.
Do you know why this might be happening ?