kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
681 stars 113 forks source link

Fix unserializable parameters value #2122

Closed ravi-kumar-pilla closed 1 month ago

ravi-kumar-pilla commented 1 month ago

Description

Resolves #2010

Development notes

QA notes

Checklist

noklam commented 1 month ago

Add a note here: We discussed in private that this is not a new bug, but never the bug never surface before. We fixed an issue previously for non-primitive type, but this one is related to non-serialisation due to customer resolver, lambda etc

rashidakanchwala commented 1 month ago

Awesome, i also tried this with #1712 and kedro viz runs !

ravi-kumar-pilla commented 1 month ago

The FE is a bit odd since we return a string but at least it doesn't break anymore. Maybe we can figure a better way in the future (Juanlu mentioned using getattr)

Yes, I was trying to avoid this and experimented different ways (like trying to iterate through the dict and only converting the dict value which is an object). So what I realized is, we can -

  1. Try to come up with all different cases (like dict, set, list, custom object etc) and have helper functions which try to serialize them before returning to FE, which might get tricky when the values are combinations of complex types. We might also ask the users to have a repr or str implemented in-case of custom objects.
  2. Try to use the jsonable_encoder provided by FASTAPI which tries its best to serialize the value and if not throws an exception.

Though this is not ideal, I could not think of any better way to handle all possible cases.

@astrojuanlu could you please share any info on how to use getattr in resolving this issue (since the name of the attribute can be anything, I am not sure how to use getattr for this scenario)

Thank you

ravi-kumar-pilla commented 1 month ago

The fix looks good to me. Can you post a screenshot how does it looks like for the final result with this change? We should include this in the docs to inform user in case this happens.

For the example where parameters file is -

split_options:
  test_size: 0.2
  random_state: 3
  target: 'price'
  data_info: "${polars:Float64}"

Custom Resolver as below -

CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "base",
      "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x)
    }
}

UI looks like -

image

Regarding the docs, we don't have much info about the metadata panel apart from stats. I can add a section about the parameter value here - https://docs.kedro.org/projects/kedro-viz/en/stable/kedro-viz_visualisation.html . What do you think ? @noklam

Thank you

noklam commented 1 month ago
image

I get this instead - where are those __module__ coming from?

ravi-kumar-pilla commented 1 month ago

I get this instead - where are those __module__ coming from?

image

This is how jsonable_encoder works when it encounters custom objects (like pl.Float64), it falls back on encoding the object's attributes (such as its class and module metadata), which is what you're seeing.

noklam commented 1 month ago

In this case I think fastapi make this harder to understand as it creates additional key that doesn't exist. What if we don't use fastapi at all? Can you also give an example that using fastapi actually make this better?

ravi-kumar-pilla commented 1 month ago

In this case I think fastapi make this harder to understand as it creates additional key that doesn't exist. What if we don't use fastapi at all? Can you also give an example that using fastapi actually make this better?

I would not say it makes better but it will handle all edge cases that we might miss. But we can also have a custom serializer function something like below which will not return any unexpected value. I was not sure if this handles all the cases.

def serialize_value(value: Any) -> Any:
    """Custom serialization function to handle different types of values,
    including primitives, lists, dictionaries, sets, tuples, and objects."""
    # Recursively serialize dictionaries
    if isinstance(value, dict):
        return {k: serialize_value(v) for k, v in value.items()}
    # Recursively serialize lists and sets
    elif isinstance(value, (list, set)):
        return [serialize_value(v) for v in value]
    # Recursively serialize tuples
    elif isinstance(value, tuple):
        return tuple(serialize_value(v) for v in value)
    # Handle primitives
    elif isinstance(value, (int, float, str, bool, type(None))):
        return value
    # Handle bytes
    elif isinstance(value, (bytes, bytearray)):
        # Convert to string
        return value.decode()
    elif hasattr(
        value, "__dict__"
    ):  # Handle custom objects by serializing their __dict__
        return serialize_value(value.__dict__)
    # Fallback: convert unknown types to a string
    return str(value)

FYI jsonable_encoder source

noklam commented 1 month ago

I see, what if we just naively use str instead when it's non-serialisable, otherwise just return it as is?

self.kedro_obj.load()

Do I understand this correctly, originally we only have this part and this will pass to a pydantic model and eventually go through the fastapi json encoder?

ravi-kumar-pilla commented 1 month ago

I see, what if we just naively use str instead when it's non-serialisable, otherwise just return it as is?

We can do that as well, but I thought since we have jsonable_encoder doing a better job than naively returning a string, opted for it. But may be this is creating a bit of confusion. The basic serialization function also seems to handle most of the cases. Do you think we can go with the function I commented ?

Do I understand this correctly, originally we only have this part and this will pass to a pydantic model and eventually go through the fastapi json encoder?

yes that is correct. So for example we have the below Response model and we allow parameters value to be Any. In case of objects, this throws error.

class TaskNodeAPIResponse(BaseGraphNodeAPIResponse):
    parameters: Dict
    ...