kubeagi / arcadia

A diverse, simple, and secure all-in-one LLMOps platform
http://www.kubeagi.com/
Apache License 2.0
82 stars 23 forks source link

AI Agent - Knowledge Assistant - Create an AI agent that can generate abstract for PDF/Word/web links, and accumulate knowledge #553

Closed nkwangleiGIT closed 8 months ago

nkwangleiGIT commented 9 months ago

Allow user to upload pdf/word or paste web links and then generate a summary, the prompt might be:

1)英文 Provide a summary based on the given content. Output the summary directly without any introductory text. Here is the content:

{{.question}}

2)中文 需要你根据一段内容写一段摘要,直接输出摘要,不需要提示性文字,内容如下:

{{.question}}

nkwangleiGIT commented 9 months ago

以句为单位来分析文章,然后预测每句话是否包含进摘要里边 参考文章:https://arxiv.org/pdf/1903.10318v2.pdf

bjwswang commented 9 months ago
bjwswang commented 9 months ago

Based on langchian's MapReduceDocuments, we can get a concise summary of the pdf. https://github.com/tmc/langchaingo/blob/main/chains/summarization.go#L54

  1. Use text splitter to split large document to chunked documents ,say 32 chunked documents
  2. Map: ask the lllm to summarize the 32 chunked documents -> 32 mapped documents
  3. Reduce(StuffDocument): join all 32 mapped documents together as the input -> output one document

Take 考勤 as an example:

Summary with QWen:

The company's attendance policy includes rules for work hours, attendance methods, consequences for violating attendance rules, and requirements for requesting approval for外出 and travel. The policy also outlines consequences for failing to comply with the rules, such as being considered absent without leave (AWOL).

Thoughts

bjwswang commented 9 months ago

Performance by zhipuai:

  1. glm-3-turbo
    
    This is a company memo outlining the guidelines for attendance management, including the purpose, scope, and procedures for recording and reviewing employee attendance. The policy aims to improve work discipline and efficiency, as well as providing a clear basis for the company\'s attendance management. The scope of this policy applies to all full-time and part-time employees, as well as interns. Employees are expected to strictly adhere to work rules and attendance regulations, and department heads are responsible for reviewing and approving employee attendance records. The Human Resource department is responsible for recording, summarizing, and monitoring the execution of this policy.

The policy outlines the consequences for being considered a scab, the procedure for requesting time off for personal reasons, and the standard for attendance. It also specifies the procedure for calculating and granting leave, and the consequences for failing to meet attendance targets. The document outlines the responsibilities of the human resources department for enforcing these rules and providing explanations and clarification.

In summary, this policy outlines the guidelines for attendance management, including the purpose, scope, and procedures for recording and reviewing employee attendance. It aims to improve work discipline and efficiency, and provides a clear basis for the company\'s attendance management.


2. glm-4

```shell
This is a company policy regarding attendance management, specifically for the purpose of improving work discipline and efficiency. It applies to all full-time and part-time employees, as well as interns. The policy outlines the responsibilities of departments leaders, the Human Resource department, and the consequences of non-compliance. It also outlines the company\'s work schedule and attendance system, including flexible work hours, shift assignments, and meal breaks.

The policy covers various aspects of employee leave, including vacation time, sick leave, personal reasons for absence, and maternity leave. It also outlines the procedures for requesting and approving leave, as well as the rules and consequences for not following the attendance system and not打卡 or for failing to properly clock in and out.

The policy covers the responsibilities of the HR department for enforcing these rules and specifies that any issues not addressed in the document will be subject to the relevant laws and regulations. It will come into effect on May 1st, 2023.

Overall, this policy aims to regulate and standardize employee leave and vacation policies, and to improve work discipline and efficiency.

With similar solution, the glm-3-turbo / glm-4 seems good enough.

bjwswang commented 9 months ago

Steps to build a chat with doc:

  1. Create a new conversation with a id
  2. Upload files with this conversation id
    • DO NOT SAVE FILE
    • processing(Embedding,Summary) this pdf in our apiserver when upload file
    • create a new collection in vectorstore with conversation\id
      1. Chat with the above conversation knowledgebase