Open hugocool opened 9 months ago
Thank you for raising this issue. This highlights an important point in the catalog's template rendering
Basically the template_params
is used at each iteration/request for rendering any Jinja template that is present in your dataset's attributes and having [[ ]]
as expression. We don't use the default Jinja block start/ending {{ }}
because they are interpreted/rendered by kedro (ConfigLoader and TemplatedConfigLoader) at catalog creation. The new OmegaConfigLoader doesn't use Jinja
At iteration/request time the datasets are already initialized and materialized as python object, kedro-boot
isn't really rendering a yaml catalog, it is instead alterating/rendering the attributes of an already initialized datasets. Concretely at request time you already have a result
dataset/object with data_ids = [[ data_ids ]]
that will be rendered before running the pipeline.
Since we're rendering the python object directly (we don't render a yaml), kedro-boot
recursively render every string
attributes (handle also strings that are imbriqued in a list or dict) and every Path
attributes.
We have two problems here. Each one is handled with a line of your workaround
1. [[ expression ]] sometimes ignored by kedro-boot
renderer
In your use case something weird happened, your self.data_ids
attribute was not a string nor a Path. It was initialized as a list in your dataset object, so kedro-boot
ignore it. This is because in the loading of catalog.yaml the data_ids
attribute was casted as list [ ... ]. So when you added a quote to the data_ids
value, it was casted as string, and correctly interpreted by kedro-boot
, but you was forced to fix the format with s = str(self.data_ids).replace("'", '"')
.
This is clearly an edge case of having [[ ]]
as block start/ending of the Jinja expression. This lead to inconsistent behaviors when the template (dataset value) contain only the jinja expression dataset_attribute: [[ expression ]]
. We should consider switching to another characters. I propose {[ expression ]}
. What do you think ?
2. Casting a non-string templatized dataset attribute
Jinja rendering always return a string. So in the case of a non-string attribute, you will always need to cast your expression. In your example you can cast it with self.data_ids = json.loads(s)
or ast.literal_eval(s)
.
I don't see this as a real problem as it's feel natural to cast somewhere a value that was being received as a text from the web server.
Let me know what you think
We are experiencing an issue where the literal string
'[[ data_ids ]]'
is being passed to the constructor of our dataset, instead of the expected list of lists. This problem occurs in the context of a FastAPI route with a Kedro session and a custom dataset configuration using a catalog entry.Here are the details of the issue and our current workaround:
FastAPI Route with Kedro Session:
In our FastAPI application, we have a route that uses a Kedro session to run a task. The
data_ids
parameter, which is intended to be a list of integers, is being incorrectly passed as a string. Here's the relevant part of the route:Catalog Entry Issue and Workaround:
The issue also manifests in our catalog entry configuration for a custom dataset:
With this setup, the string
'[[ data_ids ]]'
is passed to the constructor instead of the list of lists. To address this, we modified the catalog entry to passdata_ids
as a string explicitly:Then, in the constructor of our dataset, we convert this string back into a list of lists:
While this workaround is functional, it's not ideal. We suspect this might be a bug or at least a feature that requires clarification in the documentation. Any assistance in resolving this issue more elegantly would be greatly appreciated.
Thank you for your time and consideration.