Design for varying size of data provider

Part of #8052 and follow up for #7968. Based on initial discussion the cases we would want to support are: 1) Current case where the size of the data set is known beforehand and the existing size query produces the actual size of the data. Thus all existing apps should keep working with this without any code changes. 2) Size is unknown and not provided - it is not necessary to the user to do anything with the size query, e.g. not even implement it or just return -1 or such. The component will keep fetching things until there is no more data provided. Not sure what should the component use as a guestimate for the initial data size to adjust scroll bar if applicable. 3) Giving an estimated size that might change. Thus the application developer can return a initial size of 1000 and at some point update that the size is actually something else - without causing a full reset for the data (full data reset naturally occurs when query state like sorting or filtering changes)

Couple related things but don't need to be fixed at the same time, but what can be can be taken into account now and ticketized&investigated separately: a) the application developer can define the optimal page size for the query, instead of some default 50 items b) removing or adding items to the data set should work without refreshAll, meaning the data provider and the component will just add/drop related items and adjust any size indicators accordingly c) It has been requested in FW8 and partly here #4510 that it would be helpful if we expose an API that providers external access to what happens to the size inside the data provider / component

I think a reasonable assumption for an undefined size is to tie it to the page length, i.e. assuming that there's always at least one additional page worth of data available.

The case with an estimation provided by the application developer can actually be separated to two different cases. There's the very simple case with a static size estimate that is given up-front and never changed and the more complex case when a new estimate is queried from a callback whenever appropriate, e.g. after fetching past the previous estimate or after an explicit refresh.

Acceptance criteria:

New API design for DataProvider/DataCommunicator/Components interfaces/classes has been suggested as well as the list of a new interfaces (if any) to provide use cases 1)-3) and a)-c) from above description.
Use cases examples provided for those new API with stubbed implementation.

So, my current thoughts based on earlier investigation and discussion today with the team, followed by more investigation:

Goal: Undefined size for data provider. It should be possible for the application developer to change the "estimated" size.

    Approach A
     Introduce DynamicDataProvider & HasDynamicDataProvider (name pending).
     (Another name could be SizelessDataProvider)

     Why introduce a subclass instead of a just adding things to DataProvider ?
     1) Keep existing components, data providers and their API & behavior untouched and 100 % backwards compatible
     2) Separate data providers with undefined/changing size from the current defined size.
        Components that don't support dynamic size, can keep requiring an explicit size
        and respect the existing contract with DataProvider and the component,
        where size is fetch first and then data is fetch until the end is reached.
        For any component that supports dynamic size, should implement HasDynamicDataProvider instead.
        Similar approach to separating ConfigurableFilterDataProvider.
        Also it would imply that any component that supports dynamic size is
        fine with the fact that the fetch query might not return the asked amount of items.
        Currently this will explode (our) components and might do so for custom components too.

     Using a normal data provider in any component is still possible by having
     HasDynamicDataProvider to just support setting DataProvider too, but using a DynamicDataProvider
     should not be possible with a component that doesn't support dynamic data but would always
     require the size up front and cannot adjust to it without a reset.
     This would imply DynamicDataProvider doesn't inherit DataProvider
     and would allow to not support dynamic data for hierarchical and filtering cases
     immediately but add support later incrementally if neeeded.
     However, to be able to use dynamic data providers (without implementing the size query)
     in components that require the size to be known up front and not modified, it could be possible
     to fetch more/less items than the initial size is, and then just call refreshAll().
     To be tested though.

     Approach B
     Make DataProvider support undefined size.
     Add methods to DataProvider with possible unsupported operation exception
     for anything that is not applicable for non-dynamic case. Or add an interface
     that components implement that allows specifying the callback for providing a
     size estimate. Supporting undefined size would be indicated by a isUndefinedSize() method in
     DataProvider, which by default returns false and components would then require
     the explicit size, like thus far.
     Need to look at this more closely on what would it mean for existing components
     and data provider implementations. Using a dynamic data provider implementation in a
     component that doesn't support needs to be handled either by:
     a) throwing an exception from the size query and making users aware of the issue
     b) keeping on track on estimated size and when it is reached, calling
     refreshAll() and causing the component to reset. This currently resets the scrolling position ???
     Need to look at more what it would entail. One question is how to provide
     the "size provider" in this case so that dynamic data providers could customize
     the size when needed. Maybe those could implement the size query and 
     simply return an "estimated size" whenever needed, like when previous limit is reached.

Some examples below for approach A, not attaching the flow-data classes, but I think it could make sense to try to make approach B work instead.

public class DataProviderDemo extends Div {

    public DataProviderDemo() {
        GridDemo.PersonService personService = new GridDemo.PersonService();
        Grid<GridDemo.Person> grid = new Grid<>();
        // Examples - Approach A:

        // use case 0: explicit size, like it is set currently.
        CallbackDataProvider<GridDemo.Person, Void> provider = DataProvider
                .fromCallbacks(query -> personService
                                .fetch(query.getOffset(), query.getLimit()).stream(),
                        query -> personService.count());
        grid.setDataProvider(provider);

        // use case 1: Undefined size and nothing set. Query is repeat until nothing
        // returned (or less than asked)
        DynamicCallbackDataProvider<GridDemo.Person, Void> undefinedSizeDataProvider = DataProvider
                .fromCallback(query -> personService
                        .fetch(query.getOffset(), query.getLimit()).stream());
        // fromCallback is quite close to fromCallbacks but should be at top on
        // IDE autocomplete

        // grid will be switched to implement HasDynamicDataProvider that still
        // accepts DataProvider
        grid.setDataProvider(undefinedSizeDataProvider);

        // use case 2: How to change the data size for dynamic data provider ?

        // 2a) Maybe set from query (?) in case no access to component.
        // NOTE: this could just be replaced by the sizeProvider callback below
        DynamicDataProvider<GridDemo.Person, Void> dynamicCallbackDataProvider = DataProvider
                .fromCallback(query -> {
                    // Introduce a subtype for Query with added API to avoid methods in normal query without any effect ?
                    // -1 would be the value for "unset" and let component to decide
                    if (query.getEstimatedDataSize() == -1) {
                        query.setEstimatedDataSize(500);
                    }
                    // it would be possible to later on change the
                    return personService
                            .fetch(query.getOffset(), query.getLimit())
                            .stream();
                });

        // Add the estimated size as state to DynamicDataProvider implementations.
        // The component should not stop querying after this is reached but until data ends.
        dynamicCallbackDataProvider.getEstimatedDataSize(); // would initially return -1 for undefined

        // the initial query will execute from ^ and set the estimated size there
        grid.setDataProvider(dynamicCallbackDataProvider);

        // 2b) Change size at any point.
        // Fires an event from the DataProvider like with refreshAll()
        // that will notify the component that estimate might have changed, BUT
        // the component should not do anything if there is already 999 items shown and
        // estimated size is set to less items. So the change is only visual or maybe NOOP
        dynamicCallbackDataProvider.setEstimatedDataSize(666); // resetting guestimated size

        // if/when there would be adding & removing items possible from the data provider,
        // the estimated size would not change - as with normal DataProvider this should affect
        // the size (either queried again or just --/++). So for the use case where the
        // data provider size changes and component should update itself, there should still
        // be a reset call

        // 2c) Defining a callback that is used to
        // - get the initial size when set
        // - query for estimated size whenever bypassing the previous limit
        dynamicCallbackDataProvider.setSizeProvider(query -> {
            // check something based on query params like previous size, offset, previous size or filter
            // provide component that is used ?
            if (query.getPreviousSize() == -1) {
                return 1000;
            }
            return query.getPreviousSize() + query.getPageSize() * 4;
        });
        query -> {
            if (query.getOffset()+query.getLimit() < getEstimatedSize())
                return getEstimatedSize();

        }
        // The option 2c makes the most sense probably and support most cases,
        // adding other API is pointless ?
}

Here are some of my findings and thoughts about that:

Approach A looks more attractive IMO because we indeed separate the the functionality from the existing one and avoid mess in the DataProvider interface. I tried Approach B at first, but its not so flexible and transparent. This is a almost replication of that @pleku gave previously, but with introduction a new add listener method:

public interface DynamicDataProvider<T, F> {

    // SizeProvider implements estimated size calculation based on Query and Fetched Items size
    // SizeProvider implementations can take into account the following cases:
    // 1. fetchedSize == 0
    // 2. query.getOffset() + query.getLimit() < getEstimatedSize()
    // 3. fetchedSize < query.getLimit()
    // 4. fetchedSize > query.getLimit()
    // 5. fetchedSize == query.getLimit()
    // DataSizeReachedEvent can be thrown if the size of element reached (fetched < requested)
    void setSizeProvider(SizeProvider<T, F> sizeProvider);

    // DataSizeReachedEvent can be handled by DataCommunicator (or component) to
    // adjust the subsequent range requests to DataProvider/DataCommunicator,
    // according to actual estimated size and last query
    Registration addSizeReachedListener(DataProviderListener<T> listener);

    // Returns current estimated size of DataProvider
    // This method can just return an internal int field, or perform extra actions
    int getEstimatedSize();

    interface SizeProvider<T, F> extends Serializable {
        int get(Query<T, F> query, int fetchedSize);
    }

    class DataSizeReachedEvent<T> extends DataChangeEvent<T> {
        private int size;
        // other data necessary for adjusting the next range request (previous offset, limit, fetched items count)
        public DataSizeReachedEvent(DataProvider<T, ?> source, int size) { super(source); this.size = size; }
        // Setter and Getter
    }
}

Having DynamicDataProvider interface not inherit any other interface allow us to create a base implementation of sizeless data provider:

DynamicCallbackDataProvider<T, F> extends CallbackDataProvider<T, F>
        implements DynamicDataProvider<T, F>

Instance of DynamicCallbackDataProvider class can be then created through DataProvider.fromCallback() factory method and 'size provider' can be set later-on by client and listeners can be attached by component/DataProvider.
To allow sizeless behavior on Component side and to avoid another DataCommunicator implementation in hierarchy, I propose to introduce some delegate class that will adjust the collectKeysToFlush() method of DataCommunicator if it detects that DynamicDataProvider interface is used. This delegate can be used for all DataCommunicators for dynamic size components. The goal would be to 'trick' the component and make it think that it works with a regular DataProvider.
I don't see any other methods to add into DynamicDataProvider.

Next steps are to create POC for two approaches mentioned above.

Sorry for commenting on a closed ticket - you're simply too quick for me :)

Some random observations based on the latest proposals:

DynamicDataProvider seems like a quite weird name. What in it is "dynamic", and does that imply that the regular DataProvider would somehow be "static"?
What's the purpose of a separate setSizeProvider method instad of treating it as immutable with an (optional) constructor/factory argument? My understanding is that the implementation would always be provided by the application developer based on stuff that is already available when the instance itself is created.
getEstimatedSize() cannot take query filters into account.

The discussion about subclassing versus separate types versus additions to the current interface is very tricky. If we choose to have separate types, then there are some additional complications to take into account, such as the return type of grid.getDataProvider(). Subclassing also causes the same question, and in addition there's some mess with the Liskov substitution principle since neither variant would be a full replacement for the other. Based on those complications, I'm still placing my bets on a solution that is incorporated into the existing DataProvider API.

Sorry for commenting on a closed ticket - you're simply too quick for me :)

We're still quite far from locking anything but try to get forward with ideas, and we knowledge that things and their names can and will change.

DynamicDataProvider seems like a quite weird name. What in it is "dynamic", and does that imply that the regular DataProvider would somehow be "static"?

Good point and it was just more of an working name as undefined size feel awkward when combined with data provider and sizeless can be seen as only half truth - there is size but it might not be specified or it is just estimated.

What's the purpose of a separate setSizeProvider method instad of treating it as immutable with an (optional) constructor/factory argument? My understanding is that the implementation would always be provided by the application developer based on stuff that is already available when the instance itself is created.

Another good point. Probably constructor/factory method parameter should be first enough. Not sure if there can be a case where the size provider would be provided later on based on the component that is going to be bound to the data provider ?

getEstimatedSize() cannot take query filters into account.

I've been thinking we would drop this method, there is no reason for needing it.

If we choose to have separate types, then there are some additional complications to take into account, such as the return type of grid.getDataProvider(). Subclassing also causes the same question, and in addition there's some mess with the Liskov substitution principle since neither variant would be a full replacement for the other. Based on those complications, I'm still placing my bets on a solution that is incorporated into the existing DataProvider API.

I think from the application developers point of view, using the same interface should have better experience. The tricky part with just using DataProvider for undefined size is to cover the case where a developer uses a sizeless data provider with a component that doesn't support undefined size and needs it up front. We have identified a workaround of firing a refreshAll event to the dataprovider once the original size is reached or if the data set ends before that, but this should not cause bad UX. Maybe another option could be to make the sizeless data provider pass the size query to the SizeProvider with the details that the component doesn't support undefined size. The case is slightly similar as using an hierarchical data provider in a non-hierarchical component, but I think the effect is reversed as the UI might get broken.

Another drawback from using DataProvider is that then we would have more components to get to working from day 1, but that is of course just more work for us.

Anyway we are continuing on PoCs for both approaches so we can learn more before we decide what to do.

Thank you, @Legioth , we decided to split efforts and try both of the approaches: extended DataProvider methods and separate classes. There are ~15 classes in DataProvider hierarchy and I couldn't say in advance which approach is a better fit into this architecture, so it would be nice to try both of them and compare pros and cons later-on.

there's some mess with the Liskov substitution principle since neither variant would be a full replacement for the other

For me this is not transparent yet, but, yes this should be definitely taken into account. Web components should adjust items range requests policy depending on either DP is sizeless or not, but generally speaking the fetch API remain the same, i.e. datacommunicator/components should have fetch/size API still. Am I missed something?

such as the return type of grid.getDataProvider()

Could you please give an example? DynamicDataProvider is supposed to use along with CallbackDataProvider which has a DataProvider in its parents. So, it is not completely separated from DataProvider interface.

DynamicDataProvider seems like a quite weird name

Yes, naming pending. A regular DataProviders are also might be called dynamic, so yes, the name might be polished indeed.

What's the purpose of a separate setSizeProvider method instad of treating it as immutable with an (optional) constructor/factory argument? My understanding is that the implementation would always be provided by the application developer based on stuff that is already available when the instance itself is created.

Yes, agree.

getEstimatedSize() cannot take query filters into account.

Yes, good point. We can consider add overloaded method with filters as an argument.

The problem with the type hierarchy and in specific getDataProvider() is like this:

One option is to have DataProvider extends DynamicDataProvider to satisfy the substitution principle. DynamicDataProvider is the more generic thing that isn't accepted by all components. DataProvider is the more specific one that can be used wherever DyanmicDataProivder is expected (the size method would just not be invoked) but there would also be components that would only accept a DataProvider. Then we need to make a breaking change to weaken the return type of getDataProvider() from DataProvider to DynamicDataProvider.
The other option is to have DynamicDataProvider extends DataProvider. In this case getDataProvider() could still return DataProvider, but then we would instead break LSP since DynamicDataProvider would have a size method that would not work according to what's expected from an instance of DataProvider. There wouldn't be any way for a component's API to define that it needs a data provider implementation that can know its size.

There are ~15 classes in DataProvider hierarchy

Oh dear. If we go with separate types (regardless of the inheritance relationship between them), then we would most likely also need duplicates for most of those ~15 classes, e.g. a separate ConfigurableFilterDynamicDataProvider (name pending) as a counterpart to the current ConfigurableFilterDataProvider.

Not sure if there can be a case where the size provider would be provided later on based on the component that is going to be bound to the data provider?

If we can at least assume that this would be a very rare requirement, then we can satisfy that use case with some boilerplate through wrapping the original provider. Another alternative is to instead expose any specifics to the binding between a specific component instance and a specific data provider instance through an instance that is associated with the invocation of someComponent.setDataProvider(provider) (e.g. as a return value from that method, even though that breaks binary compatibility iirc).

The tricky part with just using DataProvider for undefined size is to cover the case where a developer uses a sizeless data provider with a component that doesn't support undefined size and needs it up front.

Another drawback from using DataProvider is that then we would have more components to get to working from day 1, but that is of course just more work for us.

This might be a relatively small trade-off since there isn't any existing application code that would pass a sizeless provider to a component that doesn't support it, simply because there isn't any application code that deals with sizeless providers at all.

For the potential future case that a future application developer will use a sizeless provider with an unsupportive component, we can use a couple of different fallback strategies:

For components that that don't do lazy loading (e.g. CheckboxGroup), it's quite fine to implement size(query) as fetch(query).count().
For components that do lazy loading but not filtering, we can implement size(query) to return a guess and then do some bookkeeping to fire a data change event when the guess is invalidated. This is not practical if filtering is also involved since we cannot know which data change listener is associated with which filter (unless we assume that there's only one component per data provider).
We can offer an easy way of explicitly choosing a strategy, and then either log a warning or throw if no strategy has been chosen.

Subsequent investigations #8183 #8184 are concluded and continuing with #8248

vaadin / flow

Design for varying size of data provider #8055

Acceptance criteria: