reflex-dev / reflex

🕸️ Web apps in pure Python 🐍
https://reflex.dev
Apache License 2.0
18.19k stars 1.01k forks source link

[REF-3014] Bug in serializing lists/dict of `rx.Base` #3438

Open picklelo opened 1 month ago

picklelo commented 1 month ago

Noticed an issue with our serializers - using lists of rx.Base is removing the outer braces:

class Test(rx.Base):
    key: str
    value: str

comp = Test(key="1", value="text1")

print(rx.utils.serializers.serialize(comp)) # prints {"key": "1", "value": "text1"}
print(rx.utils.serializers.serialize([comp])) # prints ["key": "1", "value": "text1"]

The second one is not valid JSON.

From SyncLinear.com | REF-3014

guidocalvano commented 1 month ago

I suspect For example, "{var}" will be unwrapped to "var".

in unwrap_vars,

specifcally this: {(.*?)} # extract the value between curly braces (non-greedy) from the regex

and if I remove format.unwrap_vars from the serialize_list callback I get this: {"key": "1", "value": "text1"} ["{\"key\": \"1\", \"value\": \"text1\"}"]

so that is a smoking gun...

removing this stops the curly braces from disappearing {(.*?)} # extract the value between curly braces (non-greedy)

masenf commented 1 month ago

we need to unwrap vars inside lists and dicts when they are real JS values (like functions), but for things that serialize to dictionaries, the unwrapping is taking away the outer curly braces where it shouldn't.

guidocalvano commented 1 month ago

yeah, so you need to know either the format of a var string so you can more precisely target it, with the regex. Or you need to somehow leverage the recursion that json.dumps creates, and maybe detect when you are serializing a var, so you can take away the curly braces at that point. I would prefer that second option, because it follows the semantics of what you are trying to serialize. Is there specific serialization logic for vars?

guidocalvano commented 1 month ago

Maybe something like this in the super class of all vars? https://stackoverflow.com/questions/3768895/how-to-make-a-class-json-serializable

masenf commented 1 month ago

vars are serialized up front using reflex.utils.serializers and related functions... the _var_name is already the serialized form

masenf commented 1 month ago

The problem with touching anything in this formatting code is that it's going to have fallout for other parts of the framework that have been working fine. Any changes must be very carefully considered to limit the scope of the change to the bug in question without introducing other regressions.

The entire Var system is begging for a refactor because it's too hard to special case serialization for different types of vars. I was just running into this myself while trying to get nested dict serialization to work.

guidocalvano commented 1 month ago

Whenever I am in doubt whether it is time for a refactor it is usually means I should have refactored 2 weeks ago. Refactoring is often considered terrifying. The fear usually stems from the following reasoning: "Before I can refactor I must understand how this code works". But these days I take a different perspective: Refactoring is restructuring code that leads to understanding. I start by making small simplifications (that do not change behavior) and then work my way up, never making any big change. I also see refactoring as fundamentally different from solving bugs.

guidocalvano commented 1 month ago

But... to work towards a solution: wouldn't removing the part of the regex while adding the following work?

@serializer
def serialize_var(var: rx.Var) -> str:

    # solve the problem here
    # You can make more fine grained serializers if you want

    return format_var(var)  # maybe not even format?
guidocalvano commented 1 month ago

The serialization code in the reflex code looks pretty organized....

guidocalvano commented 1 month ago

What should a var look like in the output?

masenf commented 1 month ago

Vars end up looking different based on some of their internal flags (_var_is_string and _var_is_local), but generally the str(my_var) is the format that gets used in the generated code.

I mean, feel free to experiment and try to find a fix, but the tests must still pass and reflex-web must still build. The only changes in the tests should be additions and obviously justifiably wrong behavior.

I haven't really had a chance to investigate this one much yet, but I would probably start by adding a test case for it, and then making small tweaks to the code until the new test passes and none of the existing tests fail.

As for why def serialize_var(var: rx.Var) -> str: seems weird to me, is that the Var.create method already calls serialize on whatever it gets passed, so in a sense, the Var already holds the serialized form, and AFAIK we don't pass any Var objects to the serialize method.

As for the refactoring, it's been a long time coming for this class, it needs a serious rethink to avoid all of the special cases and "external" format_* methods that additionally tweak the output in certain circumstances. Ideally the Var class itself would have subclasses for each type that needs special behavior, like VarDict, VarStr, VarBase, etc; the base Var would have the formatters for different output modes, like "prop" or "expression". Definitely need to solidify the refactor plan more to ensure the proposed API actually fixes the real problems that we face today.