Improving use_query - Githubissues

rmorshea commented 1 year ago

Current Situation

There are a number of usability problems with use_query:

Prefetch logic is not perfect and can have significant performance costs: #110
The interface does not lend itself to future extensions: #104, #103
And that, when a user accesses a field that has not been pre-fetched in the scope of a render function, the SynchronousOnlyOperation error they get from Django does not communicate this fact and how to resolve it.

Proposed Actions

Discuss and arrive at a solution to each of the above problems. I'll update this section as we reach consensus on how to proceed.

rmorshea commented 1 year ago

In my opinion, (1) and (3) are linked. Our present attempts to pre-fetch as many fields and relationships as possible are related to the fact that we're trying to protect users from experiencing a SynchronousOnlyOperation error because, when they do, they probably aren't going to know how to fix it. If we can catch SynchronousOnlyOperation errors and re-raise them with a more helpful error message, we may be able to relax how much default pre-fetching we need to do to maintain a smooth user experience.

Archmonger commented 1 year ago

The issue is those exceptions are happening in the context of component renders.

We could possibly mutate all ORM objects that pass through IDOM to provide custom exception handling. But there's no realistic way to fix this for users who attempt to directly perform ORM queries in a component without use_query.

rmorshea commented 1 year ago

Since errors that happen in the context of a render function get logged. I think the best option might be to use a custom log handler to amend the reported log record to include a more helpful message. Another option might be to change Layout to make it easier to customize error reporting in a subclass.

Archmonger commented 1 year ago

Something for us to keep in mind is that these Django exceptions may not be a permanent feature of Django. In fact, with the last revision of Django the exceptions technically are no longer necessary.

With every update Django has been adding more async ORM compatibility, I would expect these exceptions to be removed from Django core at some point.

However, this may not change what we need to do today. But it is likely to complicate our solution further down the line.

rmorshea commented 1 year ago

As for (2), I think there may be a way to get rid of the *args, **kwargs from use_query that make it difficult to add options. The idea would allow people to parametrize query functions using lambdas similarly to how use_memo works:

use_query(lambda: my_query(*a, **kw), other_options=...)

The problem here is that since the identity of the query function will change on every render, we'll need to reach into the lambda to find my_query so that you can still do use_mutation(..., refetch=my_query). It turns out this is possible:

def f(x, y):
    return x + y

g = lambda: f(1, 2)

globals_referenced_by_g = [g.__globals__[name] for name in g.__code__.co_names]
assert f in globals_referenced_by_g

There's probably a bit of work to do here in terms of hardening this heuristic, but it seems like a plausible solution to avoiding *args, **kwargs in use_query.

Archmonger commented 1 year ago

I honestly dislike the idea of heuristics, sounds like it's going to be a fairly error prone process.

I'd rather concede to a slightly uglier API (for example, the fetch_options API I just developed) over something that might require a lot of dev hours to maintain.

rmorshea commented 1 year ago

sounds like it's going to be a fairly error prone process.

On thinking further, I agree, cases like lambda: some_module.my_query(...) would make this difficult.

rmorshea commented 1 year ago

There's two more things I'd like to nail down:

Given that we're going to find some way to report a more helpful error message than the current SynchronousOnlyOperation, what should the default pre-fetch behavior of use_query be?
The exact interface of fetch_options. In particular, the object it should return.

In the case of (1), my intuition is that use_query should only try to prefetch a QuerySet. From what I understand, field pre-fetching only really matters if the user has explicitly done something like User.objects.all().only("name"). In which case, pre-fetching the other fields actively works against the user's intentions. If we're able to explain the SynchronousOnlyOperation error, I'm not really sure pre-fetching fields makese sense at all.

In the case of (2), I think fetch_options should return a callable object since this allows normal usage of the query function (e.g. in tests). I also was not clear on what the purpose of the evaluator was in your PR. Perhaps that comes from the fact that OrmFetch wasn't itself callable?

Archmonger commented 1 year ago

evaluator was my proposal to allow for supporting other ORMs within use_query. The default evaluator for Django IDOM is just a function that handles lazy fields. This might be better renamed to postprocessor.

Also I agree, if we can find a graceful way to handle error messages for SynchronousOnlyOperation exceptions, then prefetching should be disabled by default.

However, I do think fetch_options should return an object, rather than a callable to allow for future feature expansion, such as cache_policy. Additionally, this allows us to keep all fetching logic encapsulated within the use_query hook.

Archmonger commented 1 year ago

The proposal I submitted for use_query revolves around a few things

1) The only "Django specific" code within use_query was related to post-processing.

I've allowed this to be configurable via evaluator, and when moved into IDOM Core there likely won't be a default evaluator.
evaluator can also be renamed to postprocessor 2) All configuration, including configuration values which we haven't thought of yet, need to be accomplished externally to the use_query hook.
To accomplish this, I generated an OrmFetch type to encapsulate all current/future configuration values to the use_query hook.
Currently, the only configurable values are the evaluator and options, but will likely be expanded soon to contain cache_policy.
- options are configuration values that are used by the evaluator.
- options could possibly be renamed to evaluator_options
- When moved into IDOM Core, the default values for evaluator and options will likely be set via configuration globals. 3) If more "backend specific" code shows itself as necessary, we can add them as Callable attributes to OrmFetch.
For example, if we configuration values such as preprocessor or executor, this is now contained within OrmFetch. 4) All actions must be performed within the context of the use_query hook.
By this, I am saying that anything Callable should be performed directly and explicitly within the use_query function definition.
This is mostly to help with debugging and ease code refactoring
This is also the main reason why I opted for fetch_options to return a dataclass rather than a callable.

rmorshea commented 1 year ago

Ok, I have another idea. What if this is the interface:

# default options
use_query(my_query, *args, **kwargs)
# explicit options
use_query(Options(fetch_policy="cache-first"), my_query, *args, **kwargs)

MyPy seems to be ok with this:

from typing import overload, ParamSpec, TypeVar, Callable, Any

from django_idom import Query

P = ParamSpec("P")
R = TypeVar("R")

@overload
def use_query(options: Options, query: Callable[P, R], *args: P.args, **kwargs: P.kwargs) -> Query[R]: ...

@overload
def use_query(query: Callable[P, R], *args: P.args, **kwargs: P.kwargs) -> Query[R]: ...

def use_query(*args: Any, **kwargs: Any) -> Query[Any]: ...

class Options: ...

Archmonger commented 1 year ago

Works, but wouldn't provide any useful type errors if the user accidentally provides too many args.

The type checker would just assume those unneeded args get lumped into the final definition of *args: Any. We could throw an exception during that situation, but that's not as graceful as proper type checking.

rmorshea commented 1 year ago

So the type checker doesn't actually use the final definition. It only looks at the ones in the @overload decorators. The final definition just needs to be a valid union of all the overloaded definitions. You can try this out here.

Archmonger commented 1 year ago

A slight issue with this implementation is it would be fairly convoluted to support passing query/options parameters as a kwarg, such as use_query(query=my_func)

In terms of KISS, I think the decorator wins here. If we go with your type-hint based solution, we would end up needing a lot of boilerplate code for every new argument we add in the future.

I've rewritten QueryOptions to be a callable wrapper for the query function, which keeps the type hints nice and tidy.

We can also consider removing the query_options decorator, in exchange for the following:



hooks.use_query(
    QueryOptions(my_func, {"many_to_many": True}),
)

rmorshea commented 1 year ago

it would be fairly convoluted to support passing query/options parameters as a kwarg

I can't really think of a reason why a user would really want to pass those as kwargs. Plus being able to do so eats into the allowed kwargs for my_func. When we drop support for Python 3.7 (which if we follow Numpy's policy, we could do whenever we want), it seems like it would be best if the options and query params were positional-only.

In terms of KISS, I think the decorator wins here.

I'm not really convinced of that. There's very little difference between passing an options object as the first arg vs using normal kwargs in use_query for those options. The only difference is:

use_query(my_func, *args, fetch_policy=..., **kwargs)`
# vs
use_query(Options(fetch_policy=...), my_func, *args, **kwargs)

And in some ways this is preferable since it's clear to the reader which options belong to use_query vs my_func.

Archmonger commented 1 year ago

Regardless of whether they want to or not, the current type hint hack we're relying on would suggest query=... is possible. I don't mind this change if we use the positional only indicator.

I already drafted this change and can commit it to demonstrate the differences.

rmorshea commented 1 year ago

Ok, I think we can probably go with both solutions. We can have a fetch_options decorator that adds pre-fetch logic via a wrapper around the original query function in addition to the positional-only QueryOptions object that configures query behavior at the point of usage. This seems like the best of both worlds since fetch_options will remain the same on every usage, while QueryOptions could vary depending on how you say, want to configure cache behavior on a particular page.

rmorshea commented 1 year ago

We can also make django-idom 3.8 only in the next release too to take advantage of pos-only parameters. I'll can do the same with IDOM-core too.

Archmonger commented 1 year ago

Django-IDOM is already 3.8+, we only need to worry about core when this hook gets ported over.

Archmonger commented 1 year ago

I've committed the changes. This is how it looks in practice.

rmorshea commented 1 year ago

What I meant to suggest is that we could address post-processing (i.e. pre-fetching lazy model fields) and query options (i.e. ones analogous to Apollo's) with two different approaches. For post-processing, this could be accomplished with a standard decorator, and for the query options, we could add an optional positional-only argument to use_query.

The decorator would be "standard" in that it doesn't return any special object, but rather just wraps the query function:

def prefetch(many_to_many=False):

    def decorator(query_func):

        def wrapper(*args, **kwargs):
            data = query_func(*args, **kwargs)
            if many_to_many:
                # do pre-fetching logic
            return data

        return wrapper

    return decorator

The positional-only options object (i.e. the one described earlier) is something that we can add later since we haven't implemented any caching or other behaviors that would require it. With that said, if you put both approaches together, usage would look like:

@prefetch(many_to_many=True)
def get_user(*a, **kw):
    ...

@component
def example():
    user = use_query(QueryOptions(fetch_policy="cache-first"), get_user, *a, **kw)
    # QueryOptions could also be defined as a TypedDict so you could optionally do
    user = use_query({"fetch_policy": "cache-first"}, get_user, *a, **kw)

If that's still unclear I'm happy to meet to discuss further to try and nail this down.

Archmonger commented 1 year ago

Right now my thought process is it feels more natural for all "configuration" to be in one place, regardless of whether that's a decorator or positional arg.

rmorshea commented 1 year ago

The main reason I'm suggesting a decorator for configuration prefetch behavior is because it seems like, given a particular query function, users would always use the same settings? That is, for a query function that returns an object with a many-to-many relationship, users would always want to prefetch that. Thus, only allowing configuration at the call sight would require passing the same settings on every usage of that query when it could have been configured once at the definition sight. However, if prefetching behavior is more likely to vary from usage to usage, then only defining prefetch behavior at the call sight makes more sense.

Archmonger commented 1 year ago

given a particular query function, users would always use the same settings? That is not a guarantee. I can think of several situations where it would be convenient to not do that.

Also I think that the location where the postprocessor callable is configured should also be the location where the options for it are located.

reactive-python / reactpy-django

Improving use_query #113

Current Situation

Proposed Actions