orbitjs / orbit

Composable data framework for ambitious web applications.
https://orbitjs.com
MIT License
2.33k stars 134 forks source link

parallel data querying principes #974

Closed ahoyahoy closed 1 year ago

ahoyahoy commented 1 year ago

Please help me understand why all queries (jsonapi) are executed only serially and what are the recommended approaches for ideal data retrieval.

Let's say I have an entity "notes". I have a left menu where I load a list of articles with limited fields. Only name and id for example.

Then there is a content section where you can see several "last edited" notes. For example, 6 tiles, where each one individually executes a query to get the complete Note data.

As long as the user does not have the data stored in the cache, these tiles will be loaded one by one with a delay of 200 ms, which is quite visible.

If there are several different entities and their queries on a page, the whole page will load quite visibly slowly (for the user).

From 0.17 there is the possibility of multiple expressions, where I can make multiple queries in parallel. So it is theoretically possible to make some kind of "throttle" aggregator that will load all "last edited notes" tiles at once. But according to the documentation: "but query will only resolve when all have completed successfully". This exposes the application to potentially bigger crash than if only one Note tile fails for some reason.

And it seems strange to me to merge queries from several entities across the page.

Can you please explain me the reasons for this (only serial) architecture and how I should design the querying to be as fast as possible?

dgeb commented 1 year ago

@ahoyahoy Generally speaking, Orbit's request and sync queues are designed to give you control over your application's data flows. Queries and updates may be interleaved in a source's request queue, and Orbit provides a guarantee that these requests will be processed in the order they are received. For example, if a request deletes a note, then a subsequent query for all notes will not include that note. Furthermore, if there is a network outage, then there is a guarantee that this order will be maintained as requests are periodically retried.

From 0.17 there is the possibility of multiple expressions, where I can make multiple queries in parallel.

Yes, this is the mechanism that allows for default parallelization of fetch requests when processing a query. Updates can also optionally be parallelized (via the parallelRequests option), although this is not the default since order of individual operations within an update is so often important.

Can you please explain me the reasons for this (only serial) architecture and how I should design the querying to be as fast as possible?

Given your example, I would tend to aggregate data loading at a boundary that is clear for the user: like a route or page. Depending on your framework, you could also use a higher order component. In general, I avoid loading data in components unless it is acceptable to show loading state within that individual component. Otherwise, data flashes into components almost randomly throughout the page just based on the timing of many individual fetch requests, which can be disorienting.

With all that said, there are exceptions to every rule, and my goal with Orbit is to provide a configurable general purpose data library and not to force your hand with UX decisions about where and when you should show spinners in your app. I am open to providing an option (e.g. skipRequestQueue) to effectively avoid usage of queues entirely on an individual request basis when the order in which requests are made and completed is completely. I'll think this over and experiment before implementing to avoid introducing a real footgun. But in the mean time, please let me know if you have any success with Orbit's current parallelization options.

ahoyahoy commented 1 year ago

thanks for the clarification :) I tried to give a simplified example before and it didn't work out too well.

I'm calling orbit queries in asynchronously loaded components, which is even put together by the user and not the programmer... low code

A visual example could be on the page https://remix.run/ Scroll down for the animation of the eshop, which loads its parts asynchronously. Currently, if I don't aggregate queries for the entire page (menu, sales, invoices list, invoice detail), the data (and user experience) is loaded as in the case of "Without Remix". Of course I want the second example. Further down there is an example of an error that I would not have achieved with current parallelization, I guess.

`skipRequestQueue' sounds like a great idea to me. :)

ahoyahoy commented 1 year ago

@dgeb Hello Dan, what do you think about my last comment? Does it make better sense or still no or nonsense or?