Open JinHai-CN opened 5 months ago
This is exactly what I am looking for...
Is there an existing issue for the same feature request?
- [x] I have checked the existing issues.
Describe the feature you'd like
RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.
Knowledge base
- [x] create knowledge base API: create dataset #1106
- [ ] remove knowledge base
- [ ] update knowledge base
- [ ] list knowledge bases
- [ ] get the description of a specific knowledge base
Content management in knowledge base
- [ ] upload files
- [ ] download files
- [ ] remove files
- [ ] update file attributes(name, enable status, ...)
- [ ] list files
- [ ] get the description of a specific file
- [ ] start parsing a file
- [ ] abort file parsing
- [ ] get parsing progress
- [ ] get the chunk list of a parsed file
- [ ] remove chunks of a parsed file
- [ ] download/fetch a chunk of a parsed file
- [ ] update the chunk status
- [ ] insert a new chunk to a parse file
- [ ] retrieval test on a specific knowledge base
File management
- [ ] create a directory
- [ ] remove directories from a directory
- [ ] move a directory
- [ ] copy a directory
- [ ] get the description of a specific directory
- [ ] list file or directory from a parent directory
- [ ] upload files into a specific directory
- [ ] remove files from a specific directory
- [ ] download files from a specific directory
- [ ] move file
- [ ] copy file
- [ ] attach files to a knowledge base.
- [ ] get the description of a specific file
AI assistant management
- [ ] create an assistant
- [ ] remove assistants
- [ ] list assistants
- [ ] update assistant config
- [ ] get the description of a specific assitant
Model management
- [ ] list models
- [ ] get the description of a specific model
Conversation management
- [ ] create a conversation
- [ ] delete conversations
- [ ] list conversations
- [ ] chat
- [ ] get the conversation history.
Related issues: #345 #717
I am wondering with flaskrest,flaskrestplus will helps a lot. or flask_restx
for me: self.api_url = f"{base_url}/api/{version}" should be: self.api_url = f"{base_url}/{version}/api"
Also even after this change I could not call the create dataset endpoint.
- create a knowledge base is it supposed to work? I could not make it work. There seem to be a few issues with the SDK, including configuring the wrong URL path.
for me: self.api_url = f"{base_url}/api/{version}" should be: self.api_url = f"{base_url}/{version}/api"
Also even after this change I could not call the create dataset endpoint.
Hi jeremi, thanks for your question. I would like to inform you that we have introduced a newly proposed API endpoint - http://
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.
If I invert API and version number, I get a JSON response, but with a 404 in the body:
200
b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
Could you share your screenshot for the input and output?
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.
If I invert API and version number, I get a JSON response, but with a 404 in the body:
200 b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
i have the same question
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value.
If I invert API and version number, I get a JSON response, but with a 404 in the body:
200 b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
Same issue.
If I post http://localhost/api/v1/dataset
it returns:
<html>
<head>
<title>405 Not Allowed</title>
</head>
<body>
<center>
<h1>405 Not Allowed</h1>
</center>
<hr>
<center>nginx/1.18.0 (Ubuntu)</center>
</body>
</html>
I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:
200 b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
Same issue.
If I post
http://localhost/api/v1/dataset
it returns:<html> <head> <title>405 Not Allowed</title> </head> <body> <center> <h1>405 Not Allowed</h1> </center> <hr> <center>nginx/1.18.0 (Ubuntu)</center> </body> </html>
I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?
First, http://localhost/api/v1/dataset
is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.
这部分创建知识库API的代码 https://github.com/infiniflow/ragflow/pull/1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊
我之前因为急需要用 retrieval api 所以也提了个 PR https://github.com/infiniflow/ragflow/pull/1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的
这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决
所以能不能对 API 部分重构一下,我初步的设想:
大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容
这部分创建知识库API的代码 #1106 看上去和 https://github.com/infiniflow/ragflow/blob/main/api/apps/kb_app.py#L39 应用中的创建知识库API的方法内容差距挺大,而且 API的路径也完全不同,我理解这出自不同开发人员,而且知识库和数据集确实是一一对应的,但创建知识库就是创建知识库,用 /api/v1/dataset 这样的url会很难受啊
我之前因为急需要用 retrieval api 所以也提了个 PR #1763 但其实方法的内容主要都是从 https://github.com/infiniflow/ragflow/blob/main/api/apps/chunk_app.py#L256 的retrieval_test方法复制过来的
这样的开发方式让我对代码的健壮性比较担忧,屎山都是你一铲子我一铲子堆起来的,趁屎堆不大就该早点解决
所以能不能对 API 部分重构一下,我初步的设想:
- 抽象出一个 service 层,实现应用端和API端所共用的逻辑
- API 风格:API 路径要和应用端一一对应 vs API端和应用端都重构为Restful风格 vs 只把API端改为 Restful 风格,这三种我倾向在后两种选一个
大家如果有更好的想法可以提出来我们一起讨论,有好的方案我可以来开发这部分内容
IMHO 依我拙见,可以基于RAGFLOW已有实现的可以对外提供的resources[dataset,agent,dialog,conversation,tenant,user]来RESTFUI,或者openAPI标准更佳:- )
@Valdanitooooo @yangboz Thank you guys comments on RAGFlow API. We intend to create an international community, so we encourage using English for communication.
Good point. We're gona spend more time on this.
@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.
Hint: APIs to Web/SDK/developers are somewhat different.
@yangboz @JinHai-CN @KevinHuSh In order to not disrupt existing functionality, I am attempting to refactor the API in a new directory. The most ideal scenario is for the Web APP API, Server API, and SDK API to all use the same code. I hope everything goes smoothly.
What about choosing a relative naive one to file a pull request?
What about choosing a relative naive one to file a pull request?
I won't do everything at once. I hope to only complete the API for the dataset as a starting point, and then everyone can discuss together what defects and issues are. If the solution is mature, then assign the API of each resource to different people to complete. I think we must be cautious. I am still using version 0.8.0, and the bugs in the new version are causing me headaches. I don't want to create more bugs for this project.
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:
200 b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
Same issue. If I post
http://localhost/api/v1/dataset
it returns:<html> <head> <title>405 Not Allowed</title> </head> <body> <center> <h1>405 Not Allowed</h1> </center> <hr> <center>nginx/1.18.0 (Ubuntu)</center> </body> </html>
I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?
First,
http://localhost/api/v1/dataset
is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.
It actually is a valid URL. As you can se in the documentation of ragflow_api, http://<host_address>/api/v1/dataset
is a valid endpoint.
Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:
The decorator @login_required
can be found in the line 128 before the Get dataset list method.
If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the @login_required
is commented:
Or like in the case of the Get answer method, just isn't there at all.
So, anyone knows how to log in?
It actually is a valid URL. As you can se in the documentation of ragflow_api,
http://<host_address>/api/v1/dataset
is a valid endpoint.Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:
The decorator
@login_required
can be found in the line 128 before the Get dataset list method.If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the
@login_required
is commented:Or like in the case of the Get answer method, just isn't there at all.
So, anyone knows how to log in?
The ragflow_api may have some issues. You can look __init__.py
, then you will find http://localhost/api/v1/dataset
directs to api/apps/sdk/dataset.py, but /
is not implemented.In this file, token_required
instead of login_required
. To get the token, reference ragflow_api since authentication part is correct.
I used RAGFlow as a knowledge base management tool and refactored some APIs that need to be used in my application. If anyone needs more APIs, they can contribute code, and I will also take some vacation time to add more APIs.
swagger docs: http://your_ragflow_address/v1/docs
sdk usage
import os
from dotenv import load_dotenv
from ragflow import RAGFlow
load_dotenv()
RAGFLOW_API_KEY = os.environ.get("RAGFLOW_API_KEY", "")
RAGFLOW_ADDRESS = os.environ.get("RAGFLOW_ADDRESS", "")
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "")
ragflow = RAGFlow(RAGFLOW_API_KEY, RAGFLOW_ADDRESS)
# 查询所有知识库
def get_all_datasets():
res = ragflow.dataset.list()
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 通过名字查询知识库
def get_dataset_by_name(dataset_name):
res = ragflow.dataset.find_by_name(dataset_name)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 创建知识库
def create_dataset(dataset_name):
res = ragflow.dataset.create(dataset_name)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 创建知识库
def update_dataset(kb_id):
res = ragflow.dataset.update(
kb_id=kb_id, language="Chinese", embd_id=EMBEDDING_MODEL, parser_id="naive", parser_config={
"raptor": {"use_raptor": False}, "chunk_token_num": 256, "layout_recognize": True
}
)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 上传文档到知识库
def upload_documents_2_dataset(kb_id: str, file_paths: list[str]):
res = ragflow.document.upload(kb_id, file_paths)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 查询知识库中的文档
def get_all_documents(kb_id: str):
res = ragflow.dataset.list_documents(kb_id)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 修改文档解析方法
def change_document_parser(doc_id: str, parser_id: str, parser_config: dict):
res = ragflow.document.change_parser(doc_id, parser_id, parser_config)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
# 执行解析文档
def documents_run_parsing(doc_ids):
res = ragflow.document.run_parsing(doc_ids=doc_ids)
if "retmsg" in res and res["retmsg"] == "success":
return res["data"]
raise Exception(res)
def retrieval(kb_id, question, top):
res = ragflow.dataset.retrieval(
kb_id=kb_id, question=question, page_size=top, top_k=top, similarity_threshold=0.2)
if "retmsg" in res and res["retmsg"] == "success":
try:
chunks = res['data']['chunks']
docs_str = ""
if len(chunks) > 0:
for chunk in chunks:
docs_str += "\n-------\n\n" + chunk["content_with_weight"].replace("\r", "\n") + "\n\n"
print(docs_str)
return docs_str
except Exception as e:
print(e)
return "未检索到结果"
I tried it by building the latest main, and it does not work; I get a 404 with some HTML as a returned value. If I invert API and version number, I get a JSON response, but with a 404 in the body:
200 b'{"data":null,"retcode":100,"retmsg":"<NotFound \'404: Not Found\'>"}\n'
Same issue. If I post
http://localhost/api/v1/dataset
it returns:<html> <head> <title>405 Not Allowed</title> </head> <body> <center> <h1>405 Not Allowed</h1> </center> <hr> <center>nginx/1.18.0 (Ubuntu)</center> </body> </html>
I've checked the code and I guess it happens because login is required to make a requests. So my question is, how do I login?
First,
http://localhost/api/v1/dataset
is not a valid URL. Additionally, when using the API, there is no need to log in, but a token is required. You can create an assistant in the chat, and then obtain the token using the Chat Bot API's API key.It actually is a valid URL. As you can se in the documentation of ragflow_api,
http://<host_address>/api/v1/dataset
is a valid endpoint.Not only do you require the token to use this endpoint, but also a login is needed, just check the dataset_api.py file:
The decorator
@login_required
can be found in the line 128 before the Get dataset list method.If you compare this code with another used to implement a method that just requires a valid token, lets say, Get conversation history, we can notice how the
@login_required
is commented:Or like in the case of the Get answer method, just isn't there at all.
So, anyone knows how to log in?
I request the interface of login, and give the payload data(account and password), and successfully acquire the response.
Is there an existing issue for the same feature request?
Describe the feature you'd like
RAGFlow's API interfaces are not enough and theAPI are not RESTful style. The goal of this issue is to propose the RESTFul APIs which covers most functions of RAGFlow.
Knowledge base
Content management in knowledge base
AI assistant management
Conversation management
File management
Related issues: #345 #717