infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
23.85k stars 2.34k forks source link

[Feature Request]: RAGFlow API proposal #1102

Open JinHai-CN opened 5 months ago

JinHai-CN commented 5 months ago

Is there an existing issue for the same feature request?

Describe the feature you'd like

RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.

Knowledge base

Content management in knowledge base

AI assistant management

Conversation management

File management

Related issues: #345 #717

Scoutink commented 5 months ago

This is exactly what I am looking for...

yangboz commented 5 months ago

Is there an existing issue for the same feature request?

  • [x] I have checked the existing issues.

Describe the feature you'd like

RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.

Knowledge base

  • [x] create knowledge base API: create dataset #1106
  • [ ] remove knowledge base
  • [ ] update knowledge base
  • [ ] list knowledge bases
  • [ ] get the description of a specific knowledge base

Content management in knowledge base

  • [ ] upload files
  • [ ] download files
  • [ ] remove files
  • [ ] update file attributes(name, enable status, ...)
  • [ ] list files
  • [ ] get the description of a specific file
  • [ ] start parsing a file
  • [ ] abort file parsing
  • [ ] get parsing progress
  • [ ] get the chunk list of a parsed file
  • [ ] remove chunks of a parsed file
  • [ ] download/fetch a chunk of a parsed file
  • [ ] update the chunk status
  • [ ] insert a new chunk to a parse file
  • [ ] retrieval test on a specific knowledge base

File management

  • [ ] create a directory
  • [ ] remove directories from a directory
  • [ ] move a directory
  • [ ] copy a directory
  • [ ] get the description of a specific directory
  • [ ] list file or directory from a parent directory
  • [ ] upload files into a specific directory
  • [ ] remove files from a specific directory
  • [ ] download files from a specific directory
  • [ ] move file
  • [ ] copy file
  • [ ] attach files to a knowledge base.
  • [ ] get the description of a specific file

AI assistant management

  • [ ] create an assistant
  • [ ] remove assistants
  • [ ] list assistants
  • [ ] update assistant config
  • [ ] get the description of a specific assitant

Model management

  • [ ] list models
  • [ ] get the description of a specific model

Conversation management

  • [ ] create a conversation
  • [ ] delete conversations
  • [ ] list conversations
  • [ ] chat
  • [ ] get the conversation history.

Related issues: #345 #717

I am wondering with flaskrest,flaskrestplus will helps a lot. or flask_restx

jeremi commented 5 months ago

for me: self.api_url = f"{base_url}/api/{version}" should be: self.api_url = f"{base_url}/{version}/api"

Also even after this change I could not call the create dataset endpoint.

cecilia-uu commented 5 months ago
  • create a knowledge base is it supposed to work? I could not make it work. There seem to be a few issues with the SDK, including configuring the wrong URL path.

for me: self.api_url = f"{base_url}/api/{version}" should be: self.api_url = f"{base_url}/{version}/api"

Also even after this change I could not call the create dataset endpoint.

Hi jeremi, thanks for your question. I would like to inform you that we have introduced a newly proposed API endpoint - http:///api/v1/. The previous URL you mentioned is now deprecated . If you want to create a dataset, you can use http:///api/v1/dataset by POST request.

jeremi commented 5 months ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
cecilia-uu commented 5 months ago

Could you share your screenshot for the input and output?

TTTnlp commented 3 months ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

i have the same question

RELmon25 commented 3 months ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.

If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue.

If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
    <title>405 Not Allowed</title>
</head>

<body>
    <center>
        <h1>405 Not Allowed</h1>
    </center>
    <hr>
    <center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

Feiue commented 2 months ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue.

If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
  <title>405 Not Allowed</title>
</head>

<body>
  <center>
      <h1>405 Not Allowed</h1>
  </center>
  <hr>
  <center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

First, http://localhost/api/v1/dataset is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.

Valdanitooooo commented 2 months ago

这部分创建知识库API的代码 https://github.com/infiniflow/ragflow/pull/1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊

我之前因为急需要用 retrieval api 所以也提了个 PR https://github.com/infiniflow/ragflow/pull/1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的

这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决

所以能不能对 API 部分重构一下,我初步的设想:

  1. 抽象出一个 service 层,实现应用端和API端所共用的逻辑
  2. API 风格:API 路径要和应用端一一对应 vs API端和应用端都重构为Restful风格 vs 只把API端改为 Restful 风格,这三种我倾向在后两种选一个

大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容

yangboz commented 2 months ago

这部分创建知识库API的代码 #1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊

我之前因为急需要用 retrieval api 所以也提了个 PR #1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的

这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决

所以能不能对 API 部分重构一下,我初步的设想:

  1. 抽象出一个 service 层,实现应用端和API端所共用的逻辑
  2. API 风格:API 路径要和应用端一一对应 vs API端和应用端都重构为Restful风格 vs 只把API端改为 Restful 风格,这三种我倾向在后两种选一个

大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容

IMHO 依我拙见,可以基于RAGFLOW已有实现的可以对外提供的resources[dataset,agent,dialog,conversation,tenant,user]来RESTFUI,或者openAPI标准更佳:- )

JinHai-CN commented 2 months ago

@Valdanitooooo @yangboz Thank you guys comments on RAGFlow API. We intend to create an international community, so we encourage using English for communication.

KevinHuSh commented 2 months ago

Good point. We're gona spend more time on this.

Valdanitooooo commented 2 months ago

@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.

image

image

KevinHuSh commented 2 months ago

Hint: APIs to Web/SDK/developers are somewhat different.

KevinHuSh commented 2 months ago

@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.

image

image

What about choosing a relative naive one to file a pull request?

Valdanitooooo commented 2 months ago

What about choosing a relative naive one to file a pull request?

I won't do everything at once. I hope to only complete the API for the dataset as a starting point, and then everyone can discuss together what defects and issues are. If the solution is mature, then assign the API of each resource to different people to complete. I think we must be cautious. I am still using version 0.8.0, and the bugs in the new version are causing me headaches. I don't want to create more bugs for this project.

RELmon25 commented 2 months ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue. If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
    <title>405 Not Allowed</title>
</head>

<body>
    <center>
        <h1>405 Not Allowed</h1>
    </center>
    <hr>
    <center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

First, http://localhost/api/v1/dataset is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.

It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset is a valid endpoint.

Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:

image

The decorator @login_required can be found in the line 128 before the Get dataset list method.

If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required is commented:

image

Or like in the case of the Get answer method, just isn't there at all.

image

So, anyone knows how to log in?

Feiue commented 2 months ago

It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset is a valid endpoint.

Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:

image

The decorator @login_required can be found in the line 128 before the Get dataset list method.

If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required is commented:

image

Or like in the case of the Get answer method, just isn't there at all.

image

So, anyone knows how to log in?

The ragflow_api may have some issues. You can look __init__.py, then you will find http://localhost/api/v1/dataset directs to api/apps/sdk/dataset.py, but / is not implemented.In this file, token_required instead of login_required. To get the token, reference ragflow_api since authentication part is correct.

Valdanitooooo commented 2 months ago

I used RAGFlow as a knowledge base management tool and refactored some APIs that need to be used in my application. If anyone needs more APIs, they can contribute code, and I will also take some vacation time to add more APIs.

swagger docs: http://your_ragflow_address/v1/docs image

sdk usage

import os

from dotenv import load_dotenv
from ragflow import RAGFlow

load_dotenv()

RAGFLOW_API_KEY = os.environ.get("RAGFLOW_API_KEY", "")
RAGFLOW_ADDRESS = os.environ.get("RAGFLOW_ADDRESS", "")
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "")

ragflow = RAGFlow(RAGFLOW_API_KEY, RAGFLOW_ADDRESS)

# 查询所有知识库
def get_all_datasets():
    res = ragflow.dataset.list()
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 通过名字查询知识库
def get_dataset_by_name(dataset_name):
    res = ragflow.dataset.find_by_name(dataset_name)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 创建知识库
def create_dataset(dataset_name):
    res = ragflow.dataset.create(dataset_name)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 创建知识库
def update_dataset(kb_id):
    res = ragflow.dataset.update(
        kb_id=kb_id, language="Chinese", embd_id=EMBEDDING_MODEL, parser_id="naive", parser_config={
            "raptor": {"use_raptor": False}, "chunk_token_num": 256, "layout_recognize": True
        }
    )
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 上传文档到知识库
def upload_documents_2_dataset(kb_id: str, file_paths: list[str]):
    res = ragflow.document.upload(kb_id, file_paths)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 查询知识库中的文档
def get_all_documents(kb_id: str):
    res = ragflow.dataset.list_documents(kb_id)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 修改文档解析方法
def change_document_parser(doc_id: str, parser_id: str, parser_config: dict):
    res = ragflow.document.change_parser(doc_id, parser_id, parser_config)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

# 执行解析文档
def documents_run_parsing(doc_ids):
    res = ragflow.document.run_parsing(doc_ids=doc_ids)
    if "retmsg" in res and res["retmsg"] == "success":
        return res["data"]
    raise Exception(res)

def retrieval(kb_id, question, top):
    res = ragflow.dataset.retrieval(
        kb_id=kb_id, question=question, page_size=top, top_k=top, similarity_threshold=0.2)
    if "retmsg" in res and res["retmsg"] == "success":
        try:
            chunks = res['data']['chunks']
            docs_str = ""
            if len(chunks) > 0:
                for chunk in chunks:
                    docs_str += "\n-------\n\n" + chunk["content_with_weight"].replace("\r", "\n") + "\n\n"
            print(docs_str)
            return docs_str
        except Exception as e:
            print(e)
    return "未检索到结果"
Remember12344 commented 2 weeks ago

I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:

200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'

Same issue. If I post http://localhost/api/v1/dataset it returns:

<html>

<head>
  <title>405 Not Allowed</title>
</head>

<body>
  <center>
      <h1>405 Not Allowed</h1>
  </center>
  <hr>
  <center>nginx/1.18.0 (Ubuntu)</center>
</body>

</html>

I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?

First, http://localhost/api/v1/dataset is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.

It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset is a valid endpoint.

Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:

image

The decorator @login_required can be found in the line 128 before the Get dataset list method.

If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required is commented:

image

Or like in the case of the Get answer method, just isn't there at all.

image

So, anyone knows how to log in?

I request the interface of login, and give the payload data(account and password), and successfully acquire the response.