qbittorrent / qBittorrent

qBittorrent BitTorrent client
https://www.qbittorrent.org
Other
28.4k stars 3.99k forks source link

Next generation of WebAPI #14130

Closed drsepanta closed 2 years ago

drsepanta commented 3 years ago

Hi guys,

As I'm working with the WebAPI I'm gonna suggest the improvements here. Let's keep the discussion focused on the WebAPI only It's easier to track them.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/95889093-next-generation-of-webapi?utm_campaign=plugin&utm_content=tracker%2F298524&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F298524&utm_medium=issues&utm_source=github).
drsepanta commented 3 years ago

The basics of a REST API:

  1. Should support CORS
  2. Should receive json and respond with json
  3. Should follow http methods for operations
  4. Should return appropriate http status codes
  5. Should use plural nouns for resources
  6. Prefer "camelCase" for keys in JSON

I recommend reading this article to get some basic understanding of how actions/operations should be designed.

drsepanta commented 3 years ago

It just crossed my mind that it might be a better idea to sync up the client with server with a socket. What are your thoughts and ideas so far for server pushing data to client?

cc @glassez @Chocobo1

glassez commented 3 years ago

I've never understood the mindset of so-called web developers. Their ideas seem inconsistent to me. Even many of the web standards look like some kind of flawed and ill-conceived in my opinion. (Please don't take this as an attempt to offend someone.) Perhaps this is just the specifics of the industry, where everything is developing very quickly and chaotically, so there is no time to think about everything in detail... The above was the preamble to the following:

This also looks inconsistent to me. You wrap request/response data in JSON, thereby defining a new level of interaction between the client and the server, working on top of HTTP. Why do you still insist on using HTTP methods and status codes? Why not make this new protocol layer complete, and use HTTP only as a transport layer for it?

glassez commented 3 years ago

It just crossed my mind that it might be a better idea to sync up the client with server with a socket. What are your thoughts and ideas so far for server pushing data to client?

cc @glassez @Chocobo1

Please be more specific when expressing your ideas.

Chocobo1 commented 3 years ago

Should support CROS

Is this really required? Why would we want to open our resources to anyone/selected ones on the net? IMO, this is asking for trouble for our primitive web server...

drsepanta commented 3 years ago

This also looks inconsistent to me. You wrap request/response data in JSON, thereby defining a new level of interaction between the client and the server, working on top of HTTP. Why do you still insist on using HTTP methods and status codes? Why not make this new protocol layer complete, and use HTTP only as a transport layer for it?

These are idustry standards. I understand if you don't see them necessary, maybe web socket is a better option for QBT, read below.

Please be more specific when expressing your ideas.

Of course. I was suggesting to use a web socket to transfer data instead of client pulling the resource every second or so.

Is this really required? Why would we want to open our resources to anyone/selected ones on the net? IMO, this is asking for trouble for our primitive web server...

It should be open so web app can be served on a different domain. There is no difference between the web-server you already have and a rest api supporting CORS. the only difference is how browser communicate with the server and giving access based on response headers. Server should send CORS headers in response indicating who should have accees to the resource. These configuarion should be part of WebAPI settings.

drsepanta commented 3 years ago

@glassez

This is a reply to this comment

Some requests initiate long operations that are executed asynchronously, so that the results are obtained by subsequent requests. Sessions hold some data between these requests.

This breaks one of REST api rules. Endpoints must be stateless and they should return according to the request. They should not be mutated by sessions.

PS: Again I'm not saying you should implement this but these are standards.

glassez commented 3 years ago

Of course. I was suggesting to use a web socket to transfer data instead of client pulling the resource every second or so.

Without going into details, I want to note that our Web API should be convenient for any type of client applications, and not just for browsers. How difficult will it be to implement an iOS/Android/etc. application using this? Currently you can access it even using CURL.

glassez commented 3 years ago

This breaks one of REST api rules. Endpoints must be stateless and they should return according to the request. They should not be mutated by sessions.

Quote from linked docs:

Each request from the client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is therefore kept entirely on the client. client is responsible for storing and handling all application state-related information on client side.

So the sessions are still here. The only difference is where they are stored. As I said above, some requests initiate long operations that are executed asynchronously. So it still requires some mechanism to support this. Maybe you can suggest some modern way to handle this synchronously. But, given what was said earlier, I'm not sure that it will be convenient to use in any type of client applications.

drsepanta commented 3 years ago

Without going into details, I want to note that our Web API should be convenient for any type of client applications, and not just for browsers. How difficult will it be to implement an iOS/Android/etc. application using this? Currently you can access it even using CURL.

Of course you won't be able to access the resouces with curl but web-socket is a standard protocol and its client can be implemented very easily in all platforms (iOS for example). There are other ways to push data from server to client that maybe we should explore those before.

Server events is another old approach that even works with http v1 but didn't get too much attraction (php example)

Maybe you can suggest some modern way to handle this synchronously

Would you give me an example so I'd understand what kinda asynchronous operations are we dealing with?

Update: I'm working with /api/v2/sync/... and I presume you are talking about these kinda endpoints working with asynchronous operations?

Chocobo1 commented 3 years ago

It should be open so web app can be served on a different domain. There is no difference between the web-server you already have and a rest api supporting CORS. the only difference is how browser communicate with the server and giving access based on response headers. Server should send CORS headers in response indicating who should have accees to the resource. These configuarion should be part of WebAPI settings.

That is good news to hear and qbt already support adding custom http headers in the Options, so the CORS policy is ultimately up for the users to decide. IMO the current unspecified default should stay as is unless there are good reasons to change.

Of course. I was suggesting to use a web socket to transfer data instead of client pulling the resource every second or so.

What are the improvements by using a websocket over the current request/response model? I doubt update latency is a big factor here.

Also I want to note that we have to consider implementation complexity for both sides of our client/server, while firing up a websocket is just calling a built-in API for browsers/client platforms, it is very likely on the server side that we need to either write from scratch or integrate another library (which we already have enough "fun" with the current ones..)

drsepanta commented 3 years ago

That is good news to hear and qbt already support adding custom http headers in the Options, so the CORS policy is ultimately up for the users to decide. IMO the current unspecified default should stay as is unless there are good reasons to change.

Adding only headers is not quite enough. The Origin or Referer can be anything and should not be blocked but Access-Control-Allow-Origin lets the browser handle the access. In my opinion when user turns on WebAPI and perhaps another depandant option for REST API they should automatically be added. Again I'm not suggesting to implement a fully compliant spec but you can take a look at a preflight request example to see CORS is more than a couple of response headers.

What are the improvements by using a websocket over the current request/response model? I doubt update latency is a big factor here. Also I want to note that we have to consider implementation complexity for both sides of our client/server, while firing up a websocket is just calling a built-in API for browsers/client platforms, it is very likely on the server side that we need to either write from scratch or integrate another library (which we already have enough "fun" with the current ones..)

I'm glad you asked and we are discussing this topic. Have a look at this article to have a better understanding of all three approaches. But in a nutshell if there is nothing for client to be notified then there won't be any wasted requests flying around on the client, the connection persists till its broken or closed. The only drawback for websockets might be it cannot serve unlimited clients at a time, but I don't think this is an issue for qbt.

Read more about websocket api on MDN.

At the end I understand this might add complexity to qbt but I don't believe it can be any worse than what it is. If you need more details on websocket and how we can aproach an implementation I'll be more than happy to discuss further more.

Chocobo1 commented 3 years ago

Again I'm not suggesting to implement a fully compliant spec but you can take a look at a preflight request example to see CORS is more than a couple of response headers.

Thanks, I have a clearer picture now.

But in a nutshell if there is nothing for client to be notified then there won't be any wasted requests flying around on the client, the connection persists till its broken or closed.

Since qbt web server already supports Http/1.1 persistent connection, IMO the only difference here is the model to acquire data at the client-side. To be honest, I still doesn't see advantages/use case of using a subscription model over a polling one. Most of the time, I would imagine the client to ask only the data that are visible on screen and the polling model is a natural fit. Or what kinds of use case we are talking here?

glassez commented 3 years ago

One advantage of permanent connection and pushing data from the server side is the ability to update data using the event-based mechanism, without the need to do expensive calculations (serialize all current data and calculate the difference between them and the previous ones). I.e. in the same way as GUI does it.

aminpaks commented 3 years ago

Working more with the API i dont know if we wanna fix some of these things... its not the best but does the job so the question is if we really wanna put much effort to fix all these little inconsistencies?

glassez commented 3 years ago

Working more with the API i dont know if we wanna fix some of these things... its not the best but does the job so the question is if we really wanna put much effort to fix all these little inconsistencies?

If you're talking about making a new UI using the current API, then it's not a bad idea. This can be considered as a stage of development. In addition, it will allow you to more fully assess the pros and cons of the current API. And the next stage we can improve the API, already having a workable UI. Creating both API/UI parts from scratch at the same time can be a source of bigger problems.

glassez commented 3 years ago

Would you give me an example so I'd understand what kinda asynchronous operations are we dealing with?

Update: I'm working with /api/v2/sync/... and I presume you are talking about these kinda endpoints working with asynchronous operations?

A good example of asynchronous operations is interaction with the Search subsystem. There you have to start a search task that can run for a long time, so you don't get any results right away, except for the task ID. Then you can query the current status of the task and get the current results, which are accumulated on the server side.

aminpaks commented 3 years ago

If you're talking about making a new UI using the current API, then it's not a bad idea. This can be considered as a stage of development. In addition, it will allow you to more fully assess the pros and cons of the current API. And the next stage we can improve the API, already having a workable UI. Creating both API/UI parts from scratch at the same time can be a source of bigger problems.

Totally agree. I decided to build the new UI with the current API and see how things go. I'm sure it will work out and I'll have more understanding what the new REST api may look like.

aminpaks commented 3 years ago

A good example of asynchronous operations is interaction with the Search subsystem. There you have to start a search task that can run for a long time, so you don't get any results right away, except for the task ID. Then you can query the current status of the task and get the current results, which are accumulated on the server side.

I see. Rest(ful) api promotes the stateful operations. For that example you can have an endpoint that creates resources and then with the provided id you will have the results.

I'm gonna give this theoretical example but once I'll work more with qbt api ill suggest a real world example, it'll be more useful. In the following example everything is presented in state. We try to avoid operations and only refer to resource properties. Also we try to a use HTTP methods (verbs) for relative operation of a CRUD API

  1. User request creating a resource by making an http POST request to /api/v3/resources with following payload

    {
    "url": "https://images.google.com/..."
    }

    for this simple payload we may be using query strings but JSON is always preferred.

  2. API returns a new created resource with a unique identifier and continues updating it in the background and return following payload

    {
    "resourceId": 1,
    "state": "inProgress",
    "url": "..."
    }
  3. User makes a GET request to /api/v3/resources/1 to get the latest status of resource

  4. API returns the status of the resource as following (the work is not done yet, we may indicate the progress if any)

    {
    "resourceId": 1,
    "state": "inProgress",
    "url": "..."
    }
  5. User decides to cancel the operation on the resource by making a PATCH request to /api/v3/resources/1 with following payload

    {
    "resourceId": 1,
    "state": "cancel"
    }
  6. API updates the resource, cancel the operation and returns the modified version of the resource as following

    {
    "resourceId": 1,
    "state": "canceled",
    "url": "..."
    }
glassez commented 3 years ago

once I'll work more with qbt api ill suggest a real world example, it'll be more useful.

Okay, let's wait for that. I'm sure there are more nuances than in this simplified example. For example, taking care of abandoned resources (I don't think we can just ignore this). Or the simultaneous access (I mentioned earlier) from different locations/devices/clients. Now any concurrency is avoided by using sessions. You'll have to solve it some other way.

aminpaks commented 3 years ago

Just crossed my mind some APIs can be public (doesn't require authentication.) For example getting version of QBT should be a public API as the data is not private unless we are talking about vulnerabilities which still I believe should be users responsibility to upgrade.

By making some APIs public we can display the version for example even if the user is not logged in.

aminpaks commented 3 years ago

Something that I just noticed is that we have different options for providing delimiter for list of items in request payloads. We should have just one convention and stick to it for consistency. Somewhere we expect to have | and somewhere else is \n, this is confusing and counter productive.

aminpaks commented 3 years ago

As I already mentioned about REST api, the convention is that endpoints refer to resources. For example for categories it can be something like this:

Create is a POST request to /api/v3/categories with payload below:

{
  "name": "My Category",
  "savePath": "/something/something"
}

and server should return a 200 response with payload below:

{
  "id": "d5e3adf2-ac18-4f56-9b8d-c1ca76b24310",
  "name": "My Category",
  "savePath": "/something/something"
}

Edit is a PUT request to /api/v3/categories/d5e3adf2-ac18-4f56-9b8d-c1ca76b24310 with payload below:

{
  "name": "Rename 'My Category'",
  "savePath": "/users/bob/something new/home"
}

and server should return a 200 response with similar payload as create.

Delete is a DELETE request to /api/v3/categories/d5e3adf2-ac18-4f56-9b8d-c1ca76b24310 with no payload

For any mass/batch operations it may be a PATCH call to /api/v3/categories with the proper payload to describe what operation should be done.

CzBiX commented 3 years ago

Maybe off topic.

The qb-web I made doesn't works via CORS. Because of cookies issues in CORS.

see https://web.dev/samesite-cookies-explained/.

glassez commented 3 years ago

We should have just one convention and stick to it for consistency.

Agree. But it seems that this problem will disappear by itself if we use JSON to pass the request parameters.

glassez commented 3 years ago

As I already mentioned about REST api, the convention is that endpoints refer to resources.

To be honest, I can't imagine that we can apply this abstraction to all the endpoints. Maybe I just didn't think about it in detail. It would be nice if you could provide similar ideas for every aspect of the API you are currently working with. So we can accumulate them for the future API v3.

As for Category example above, I am confused by the use of "special" identifiers for categories. Why not use the category name as an identifier? Otherwise, we will have to maintain additional mapping in the Web API layer somehow.

aminpaks commented 3 years ago

But it seems that this problem will disappear by itself if we use JSON to pass the request parameters.

No for any mass/batch operations we will need to provide the ids in the url and that will be separated by a delimiter by convention.

It would be nice if you could provide similar ideas for every aspect of the API you are currently working with.

I will try to provide a more detailed post including all the endpoints that I already used in the Web UI with examples sometime soon.

I am confused by the use of "special" identifiers for categories. Why not use the category name as an identifier?

The reason to use an identifier and not the name is we will be able to rename the categories. Resources should not be identified by their names but by their unique identifier (possibly a UUID)

we will have to maintain additional mapping in the Web API layer somehow

Would you explain what's this additional mapping?

aminpaks commented 3 years ago

@CzBiX

It is off-topic, you should create an issue on your project and share the link so we can focus on that issue. It can be solved, recently I read more about cookies and cross-origin and I might be able to help you fix it.

glassez commented 3 years ago

we will have to maintain additional mapping in the Web API layer somehow

Would you explain what's this additional mapping?

UUID <=> Category mapping. So you can obtain Category by UUID or UUID of Category. Otherwise, we will have to change the core of the application to add support for UUIDs there. Currently, Torrents refer to Categories simply by name, and this is quite enough for us.

glassez commented 3 years ago

But it seems that this problem will disappear by itself if we use JSON to pass the request parameters.

No for any mass/batch operations we will need to provide the ids in the url and that will be separated by a delimiter by convention.

Feel free to give examples of queries/data. Do not forget that not all of your interlocutors are well familiar with web stuff, such as REST. It's not clear to me yet why we can't pass the entire batch as JSON.

CzBiX commented 3 years ago

@aminpaks Thanks, but this cookies issue can't be fixed without some changes in QBT. So I hope the Next generation API can support CORS.

aminpaks commented 3 years ago

but this cookies issue can't be fixed without some changes in QBT.

It's actually not that off-topic. Are you referring to SameSite directive to be set as lax?

CzBiX commented 3 years ago

Are you referring to SameSite directive to be set as lax?

Yes and no, I did not specifically refer to this method. There are other ways to resolve CORS issue, such as use other HTTP header to auth.

aminpaks commented 3 years ago

We will have to change the core of the application to add support for UUIDs there. Currently, Torrents refer to Categories simply by name, and this is quite enough for us.

I see, so internally you guys refer to categories directly with their names. Yeah this limits categories functionality not be very flexible specially for renaming them. It's really up to you if it's a lot of work feel free to keep it like this. I personally think this should be fixed.

Do not forget that not all of your interlocutors are well familiar with web stuff, such as REST. It's not clear to me yet why we can't pass the entire batch as JSON.

Sorry for my ignorance, I tend to forget to explain things in full detail. I'm gonna try to share some articles and this way we can discuss only parts that are not clear instead of writing a full api doc or spec. It might be a good idea to have a swagger-ui actually for v3, it's just simpler to maintain an open api spec.

To answer to your question why can't we use request payload to do batch calls, we actually can.

One approach is explained in this or this article. Pay attention how the responses return the results in an array including the status codes, that's how we would fulfill some of the requests and flag the ones that failed. Another approach is to simplify the payloads by organizing the batch operations by endpoints. Here is a good answer on SO. The important point here to pay attention is each batch request should return individual results for each entry. The results should not be 200 and all is good without any specific details or if one fails then the whole operation gets canceled.

aminpaks commented 3 years ago

There are other ways to resolve CORS issue, such as use other HTTP header to auth.

Try to use HttpOnly cookies, it's just easier to protect against XSS. Having the access token in local storage is just an open door for XSS attacks to grab it without any problem.

glassez commented 3 years ago

I see, so internally you guys refer to categories directly with their names. Yeah this limits categories functionality not be very flexible specially for renaming them.

Renaming a category is a very rare operation (if I'm not mistaken, we don't even provide it currently), so that a small inconvenience will greatly concern us. But adding an ID will cause much more problems (including compatibility problems with previous versions).

The important point here to pay attention is each batch request should return individual results for each entry.

In single batch Response, IIRC, which is expected.

The results should not be 200 and all is good without any specific details

I'm not sure what you mean. Just omit the response entry status if it is 200 (in other words, if the status is omitted, then it is supposed to be 200)?

if one fails then the whole operation gets canceled.

I don't agree. If the requests inside a batch are independent then I don't see the point to fail entire batch due to one or several failed requests (moreover, some requests can be already executed and have some effect/result).

https://github.com/qbittorrent/qBittorrent/issues/14130#issuecomment-778783661 My main idea is that you should support your comments with small examples of what it might look like in the next (REST) API (e.g. brief description of some endpoint with accepted params/methods and returned results).

To answer to your question why can't we use request payload to do batch calls, we actually can.

Moreover, I didn't notice another example in your links (everywhere all batch jobs are accepted as JSON). Although in some example I saw mixing JSON with query parameters (endpoint?param1=<some_params_as_json>&param2=value2&param3=value3), but I still think it's not a very good idea. Since we are dealing with JSON, why can't we put everything there?

aminpaks commented 3 years ago

Renaming a category is a very rare operation (if I'm not mistaken, we don't even provide it currently), so that a small inconvenience will greatly concern us. But adding an ID will cause much more problems (including compatibility problems with previous versions).

Agree, based on what you describe I think it's gonna be a bit messy to implement updating category's name. Never mind then.

I'm not sure what you mean. Just omit the response entry status if it is 200 (in other words, if the status is omitted, then it is supposed to be 200)?

My bad I didn't explain properly what I mean by "The results should not be 200 and all is good without any specific details". What I mean is for a batch operation that will create 3 resources on the server, the response should include 3 results in an array. How the response payload is constructed is up for debate, I'm going to provide an example:

[
    {
        "status": 200,
        "errors": [],
        "success": { "name": "whatever" }
    },
    {
        "status": 400,
        "errors": [
            { "message": "Invalid request, missing parameter X"}
        ],
        "success": null
    },
    {
        "status": 200,
        "errors": [],
        "success": { "name": "some other info" }
    }
]

I don't agree. If the requests inside a batch are independent then I don't see the point to fail entire batch due to one or several failed requests (moreover, some requests can be already executed and have some effect/result).

Again my bad, I think you misunderstood what I mean. I am actually saying they same thing as you. If an operation can separately process a batch request then we should not fail the whole request just because of one failure.

Moreover, I didn't notice another example in your links (everywhere all batch jobs are accepted as JSON). Although in some example I saw mixing JSON with query parameters (endpoint?param1=&param2=value2&param3=value3), but I still think it's not a very good idea. Since we are dealing with JSON, why can't we put everything there?

It's a convention. We can decide to accept everything in the request payload and be done with it. Just an example why passing details in URL params might be simpler is deleting resources. If you would like to delete multiple resources in batch, in most cases all server needs is the ID of the resources, so API can accept an HTTP request of method DELETE and the list of IDs in the query param.

There's nothing wrong with that, it's just better to have a convention and follow it everywhere. We should design all the APIs to require a payload in the request for all operations (Create, Update, Delete) instead of query params.

My main idea is that you should support your comments with small examples of what it might look like in the next (REST) API (e.g. brief description of some endpoint with accepted params/methods and returned results).

Sounds good. I think we will need a RFC that we can discuss the API. I'll use the first post.