infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.86k stars 2.24k forks source link

refactor(API): Refactor datasets API #2439

Closed Valdanitooooo closed 1 month ago

Valdanitooooo commented 2 months ago

What problem does this PR solve?

discuss:https://github.com/infiniflow/ragflow/issues/1102

Completed

  1. Integrate API Flask to generate Swagger API documentation, through http://ragflow_host:ragflow_port/v1/docs visit
  2. Refactored http_token_auth

    
    class AuthUser:
    def __init__(self, tenant_id, token):
        self.id = tenant_id
        self.token = token
    
    def get_token(self):
        return self.token

@http_token_auth.verify_token def verify_token(token: str) -> Union[AuthUser, None]: try: objs = APIToken.query(token=token) if objs: api_token = objs[0] user = AuthUser(api_token.tenant_id, api_token.token) return user except Exception as e: server_error_response(e) return None

resources api

@manager.auth_required(http_token_auth) def get_all_datasets(query_data): ....

3. Refactored the Datasets (Knowledgebase) API to extract the implementation logic into the api/apps/services directory
![image](https://github.com/user-attachments/assets/ad1f16f1-b0ce-4301-855f-6e162163f99a)
4. Python SDK,  I only added get_all_datasets as an attempt, Just to verify that SDK API and Server API can use the same method.

from ragflow.ragflow import RAGFLow ragflow = RAGFLow('', 'http://127.0.0.1:9380') ragflow.get_all_datasets()

5. Request parameter validation, as an attempt, may not be necessary as this feature is already present at the data model layer. This is mainly easier to test the API in Swagger Docs service

class UpdateDatasetReq(Schema): kb_id = fields.String(required=True) name = fields.String(validate=validators.Length(min=1, max=128)) description = fields.String(allow_none=True) permission = fields.String(validate=validators.OneOf(['me', 'team'])) embd_id = fields.String(validate=validators.Length(min=1, max=128)) language = fields.String(validate=validators.OneOf(['Chinese', 'English'])) parser_id = fields.String(validate=validators.OneOf([parser_type.value for parser_type in ParserType])) parser_config = fields.Dict() avatar = fields.String()


#### TODO

1. Simultaneously supporting multiple authentication methods, so that the Web API can use the same method as the Server API, but perhaps this feature is not important.
I tried using this method, but it was not successful. It only allows token authentication when not logged in, but cannot skip token authentication when logged in 😢

def http_basic_auth_required(func): @wraps(func) def decorated_view(*args, **kwargs): if 'Authorization' in flask_request.headers:

If the request header contains a token, skip username and password verification

        return func(*args, **kwargs)
    if flask_request.method in EXEMPT_METHODS or current_app.config.get("LOGIN_DISABLED"):
        pass
    elif not current_user.is_authenticated:
        return current_app.login_manager.unauthorized()

    if callable(getattr(current_app, "ensure_sync", None)):
        return current_app.ensure_sync(func)(*args, **kwargs)
    return func(*args, **kwargs)

return decorated_view
2. Refactoring the SDK API using the same method as the Server API is feasible and constructive, but it still requires time
I see some differences between the Web and SDK APIs, such as the key_mapping handling of the returned results. Until I figure it out, I cannot modify these codes to avoid causing more problems
for kb in kbs:
    key_mapping = {
        "chunk_num": "chunk_count",
        "doc_num": "document_count",
        "parser_id": "parse_method",
        "embd_id": "embedding_model"
    }
    renamed_data = {}
    for key, value in kb.items():
        new_key = key_mapping.get(key, key)
        renamed_data[new_key] = value
    renamed_list.append(renamed_data)
return get_json_result(data=renamed_list)


### Type of change

- [x] Refactoring
JinHai-CN commented 2 months ago

Thank you for your PR. Would you please use English to write the source code comment?

Valdanitooooo commented 2 months ago

Thank you for your PR. Would you please use English to write the source code comment?

Of course, I will

KevinHuSh commented 1 month ago

Thank you for your PR. Would you please use English to write the source code comment?

Of course, I will

Thanks a lot. Could you move this PR to branch 'api'. It's more convinient for us to move forward.

Valdanitooooo commented 1 month ago

move this PR to branch 'api'

👌

Valdanitooooo commented 1 month ago

@KevinHuSh Done.