Closed Luchao2012 closed 3 months ago
Hey @Luchao2012, great to see you tackling new challenges with LlamaIndex! 👋 It's like we're on a tech adventure together. I'm diving into your issue right now and will be back with insights shortly. Stay tuned! 🕵️♂️
Maybe confirm your llm works
print(llm.complete("hello world"))
If it works, try hitting ctrl-c when it's "stuck" and see where it is in the code from the traceback
print(llm.complete("hello world")) shows: Hello! How can I assist you today?
All the code works until the last line of code where is "stuck": eval_questions = data_generator.generate_questions_from_nodes()
print(llm.complete("hello world")) shows: Hello! How can I assist you today?
All the code works until the last line of code where is "stuck": eval_questions = data_generator.generate_questions_from_nodes()
To address the issue where eval_questions = data_generator.generate_questions_from_nodes()
hangs without progress, consider the following steps:
Check for Asyncio Loop Conflicts: If your application already has an event loop running, use agenerate_questions_from_nodes()
within your async function instead of generate_questions_from_nodes()
.
Verify Azure Service Configuration: Ensure that your AzureOpenAI and AzureOpenAIEmbedding configurations are correct, including credentials and endpoints.
Inspect API Rate Limits: You might be hitting Azure's rate limits. Check the Azure dashboard for any rate limit warnings.
Debug Async Jobs Runner: Investigate the run_jobs
function for issues in task dispatch or completion, especially if it's related to the progress bar or async task management.
Validate Input Nodes: Make sure the data passed to generate_questions_from_nodes()
is correctly formatted and valid.
Handle Exceptions in Async Tasks: Add exception handling around async tasks to catch and log errors that might not be visible.
Check Model Compatibility: Confirm that the models gpt-3.5-turbo
and text-embedding-ada-002
are being used correctly according to their input and output requirements.
Update Libraries: Ensure all dependencies, including Azure SDKs, are up to date.
Enable Detailed Logging: Increase logging levels to gain more insight into the operation of the function and Azure services.
Simplify and Manually Test: Try a simplified manual test with the Azure AI services to isolate the issue.
Regarding the internal error handling of generate_questions_from_nodes()
, it primarily relies on the async function it wraps. Any exceptions occurring in the async tasks need to be caught by the caller or through global exception handling in your application.
These steps should help you troubleshoot and resolve the issue with the function hanging. If the problem persists, consider reaching out to Azure support with detailed logs and a description of your troubleshooting steps.
I changed it to RagDatasetGenerator from llama_index.core.llama_dataset.generator import RagDatasetGenerator dataset_generator = RagDatasetGenerator.from_documents(docs, num_questions_per_chunk=1, llm=llm, show_progress=True) rag_dataset = dataset_generator.generate_dataset_from_nodes()
There is progress now. What is the reason that the DatasetGenerator does not progress with all the other code are the same.
@Luchao2012 and if you hit ctrl+c (or kill/interrupt the program), what's the traceback?
It works fine for me locally, so just trying to figure out where the issue is
If you are running in a notebook (or some other async function), maybe try the async version? eval_questions = await data_generator.agenerate_questions_from_nodes()
Bug Description
data_generator.generate_questions_from_nodes() does not generate questions. Progress bar does not move
Version
0.10.27
Steps to Reproduce
llm = AzureOpenAI( model="gpt-35-turbo", deployment_name="test-gpt-35-turbo", api_key=api_key, azure_endpoint=azure_endpoint, api_version=api_version, )
You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding( model="text-embedding-ada-002", deployment_name="test-text-embedding-ada-002", api_key=api_key, azure_endpoint=azure_endpoint, api_version=api_version, )
from llama_index.core import Settings
Settings.llm = llm Settings.embed_model = embed_model
reader = SimpleDirectoryReader( input_dir="../../reports_reservoir_sub", recursive=True, )
docs = reader.load_data(num_workers=4) # Load the documents from the directory
from llama_index.core.evaluation import DatasetGenerator
data_generator = DatasetGenerator.from_documents(docs, num_questions_per_chunk=1, llm=llm, show_progress=True)
eval_questions = data_generator.generate_questions_from_nodes()
The code stops at the end and does not progress.
Relevant Logs/Tracbacks
No response