netbox-community / netbox

The premier source of truth powering network automation. Open source under Apache 2. Public demo: https://demo.netbox.dev
http://netboxlabs.com/oss/netbox/
Apache License 2.0
15.44k stars 2.52k forks source link

API docs: Include a reference to the type of DRF pagination used #3700

Closed ajknv closed 4 years ago

ajknv commented 4 years ago

Change Type

[X ] Addition [ ] Correction [ ] Deprecation [ ] Cleanup (formatting, typos, etc.)

Proposed Changes

In the API documentation for pagination, note the type of DRF pagination backing it, ideally with a small cautionary note on the consequences for pagination during concurrent modification. While it might seem fairly evident, the end-user isn't necessarily studying the details of the design choices Netbox is implementing in its usage of backing frameworks and libraries. A simple, low-cost documentation tweak can go a long way in helping to clearly define the behavior from a product perspective and set proper expectations.

(Related: https://github.com/netbox-community/netbox/issues/3693 https://github.com/netbox-community/netbox/issues/3699 )

tyler-8 commented 4 years ago

I'm not entirely following overall problem that you're covering in the related issues you're linking. In my mind, a user should pull back ALL records across all pages that you want to delete - THEN delete them. You should not pull one page, delete all items on page, pull second page, delete all items on second page. This seems like a sub-optimal way to do this workflow.

ajknv commented 4 years ago

If there is concurrent modification, the problem exists either way -- the attempt to prefetch all records is subject to the same problem. There's also the question of optimal with respect to what metric? If we were to presume that prefetching hypothetically avoided the problem it implies pulling all the result data into a complete chunk in memory -- at least client-side -- which may be undesirable; it certainly seems at odds with the motivation for having pagination in the first place. And since the delete API is single-id based anyway, it's not apparent why having the records (or at least ids) prefetched is more optimal from any performance/functional standpoint. It can also be the case that for convenience in stream-processing use cases, the pagination is hidden behind higher level client language constructs like a python generator function, in which case there's not necessarily any sensible way for the real terminal end-client to determine when it should attempt to prefetch.

Perhaps it would be easier to provide a more general problem statement: in the face of concurrent access that may add/delete records, no query involving pagination is guaranteed to actually reliably return all of the records that the first page of the results (including the "count"!) indicates matched the query, even as of the moment of the query execution. The fact that the API doesn't return a fundamentally consistent view via pagination except in the strictly single-user case can prospectively impact all kinds of practical use cases, and seems worthy of at least a small documentation note.

hSaria commented 4 years ago

I see what @ajknv is pointing out. This is the basic modify-during-iteration problem/race-condition.

An example would be getting the list of cable connections of which there are tens of thousands and, while paginating (i.e. navigating using the next attribute in the response), add/delete one of the connections. The paginating query will either skip an entry or process an entry twice, depending on the connections list was shifted (addition/deletion).

I don't think it requires a dedicated section in the documentation. I'd classify this as a bug with #3699 as the solution; I think the current behaviour shouldn't exist as it can poison the response to a client.

DanSheps commented 4 years ago

RESTful data, by it's very nature is non-persistent. You should be discarding request if it detects a change in the recordset (easily done by checking the "count" variable) and re-initiating the request

hSaria commented 4 years ago

I suppose using the count would alleviate the problem, though you can still run into the issue if a record is removed from the head of the list and a new one is added to the tail of it (doesn’t necessarily need to be at the head/tail, just before/after the current page).

On 20 Nov 2019, at 6:53 pm, Daniel Sheppard notifications@github.com wrote:

 RESTful data, by it's very nature is non-persistent. You should be discarding request if it detects a change in the recordset (easily done by checking the "count" variable) and re-initiating the request

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

hSaria commented 4 years ago

I think the Django REST framework has already went through the benefits and drawbacks of the proposed approach, as well as other vital info on the subject of pagination.

Relating to NetBox, I think IPAM is most positively impacted by this.

LBegnaud commented 4 years ago

In regards to the original issue, seems like a simple workaround would be to get page 1, delete, then get page 1 again

hSaria commented 4 years ago

That works when the same client is the one controlling the data. The challenge is having multiple clients manipulate it; the more clients you have, the more likely you are to experience the effects of this.

Stated differently, the more automation you have around your data in NetBox, the less accurate and reliable the behaviour becomes, though I suppose that largely depends on what you’re doing with the data.

On 20 Nov 2019, at 8:25 pm, LBegnaud notifications@github.com wrote:

 In regards to the original issue, seems like a simple workaround would be to get page 1, delete, then get page 1 again

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jeremystretch commented 4 years ago

This is not a unique characteristic of NetBox and doesn't need to be explained in the docs. Most pagination systems work this way (GitHub issues, for example).