run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.74k stars 5.27k forks source link

If in a Chinese question answering system ,how to make the answers complete #3831

Closed pythonmanGo closed 1 year ago

pythonmanGo commented 1 year ago

If in a Chinese question answering system make sure the answers are complete. A few questions about the use of llama_index:

1: If I do not load md at once (there are many md files entered into the project as documents) or if I do not cache index.storage_context.persist(" dir ") locally, I may need to reload many md at a time, which will consume a large amount of tokesn. This is obviously not scientific.

2: If I serialized md files to the local cache first(serialized jason)。I only need use the token required to raise the problem each time, it is more economical in terms of token usage. However, I need to build the index by loading the cache:

storage_context = StorageContext.from_defaults(persist_dir=datadir)

# build index
#index = GPTVectorStoreIndex.from_documents(documents)
index = load_index_from_storage(storage_context)

The problem with the second method is that if the Chinese Q&A is used to feed the data, subsequent users will ask questions, but cannot give complete answers. Maybe it's because max_tokens is limited in size by default. But loading index through a local cache doesn't seem to reset max_tokens.I need to konw how to set max_tokens while user answer question, or mybe the cause is i do not set the correct split character?how to fix it .Let the system run correct.

code

` def LoadMixSearchGPT(dirdata):

# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=512))

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

index = GPTVectorStoreIndex.from_documents(
    documents, service_context=service_context
)

index.storage_context.persist(persist_dir=dirdata)

def MixSerchGPT(prompt, datadir): prompt=prompt.strip()

try:

    storage_context = StorageContext.from_defaults(persist_dir=datadir)

    # build index
    #index = GPTVectorStoreIndex.from_documents(documents)
    index = load_index_from_storage(storage_context)

    # configure retriever
    retriever = VectorIndexRetriever(
        index=index, 
        similarity_top_k=1,

    )

    # configure response synthesizer
    response_synthesizer = ResponseSynthesizer.from_args(
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.7)
        ]
    )
    #retriever = index.as_retriever(retriever_mode='embedding')

    # assemble query engine
    query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=response_synthesizer,
    )

    # query
    response = query_engine.query(prompt)

except Exception as e:

        print("erro:",e)

return response `
madguyjoe commented 1 year ago

Same issue here. The Chinese version of answers from query_engine are much shorter than the English one, and usually truncated. This happened in the earlier version of the llama_index also, but fixed back then (load from the saved index file before querying). Not sure why this re-emerges...

dosubot[bot] commented 1 year ago

Hi, @pythonmanGo! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are experiencing issues with loading and caching the index in the Chinese question answering system. You mentioned concerns about ensuring the completeness of the answers provided by the system, and you are unsure about setting the max_tokens parameter and the split character. Another user, @madguyjoe, has also encountered a similar issue with truncated Chinese answers in a previous version of the system.

At this point, the issue remains unresolved. Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on this issue. If not, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. We look forward to hearing from you soon.