infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
18.18k stars 1.84k forks source link

refactor(API): Split SDK class to optimize code structure #2515

Closed Valdanitooooo closed 1 week ago

Valdanitooooo commented 1 week ago

What problem does this PR solve?

  1. Split SDK class to optimize code structure ragflow.get_all_datasets() ===> ragflow.dataset.list()
  2. Fixed the parameter validation to allow for empty values.
  3. Change the way of checking parameter nullness, Because even if the parameter is empty, the key still exists, this is a feature from APIFlask.

if "parser_config" in json_data ===> if json_data["parser_config"]

image

  1. Some common parameter error messages, all from Marshmallow

Parameter validation configuration

    kb_id = fields.String(required=True)
    parser_id = fields.String(validate=validators.OneOf([parser_type.value for parser_type in ParserType]),
                              allow_none=True)

When my parameter is

kb_id=None,
parser_id='A4'

Error messages

{
    "detail": {
        "json": {
            "kb_id": [
                "Field may not be null."
            ],
            "parser_id": [
                "Must be one of: presentation, laws, manual, paper, resume, book, qa, table, naive, picture, one, audio, email, knowledge_graph."
            ]
        }
    },
    "message": "Validation error"
}

Type of change

KevinHuSh commented 1 week ago

Suggestion: To make it more readable, please use entire words to name a variables exposed/visible to users. Like:

Valdanitooooo commented 1 week ago

To make it more readable, please use entire words to name a variables exposed/visible to users.

@KevinHuSh Great suggestion. I am doing this to ensure compatibility with both Web API, SDK API, and Server API, so that I can directly copy the request parameters from the browser to use the SDK or Swagger docs. Do we have a plan to refactor front-end requests? I didn't dare to modify other people's code because I was afraid it would have too much impact.

Refactoring will bring a lot of workload. If we don't refactor the front-end requests, it's also a good choice to let all three use the same logic. Just use the same parameters in the SDK API and Server API, and use a different set of parameters in the Web API. Do you have any ideas?

I have extracted the business logic into the services package. I believe that if web APIs can also use these logics, future functionality will be better maintained.

api/apps
├── apis
│   └── datasets.py --
├── services         ⬇️
│   └── dataset_service.py
├── ...         ⬆️
├── ...         ⬆️
└── kb_app.py --⬆️
KevinHuSh commented 1 week ago

I dare not either ^^ So, for SDK and openapi firstly.