iterative / dvc-render

Library for rendering DVC plots
http://docs.iterative.ai/dvc-render/
Apache License 2.0
6 stars 5 forks source link

Modified recursive dictionary parsing to correctly handle lists in dvc plot templates #134

Closed alexk101 closed 1 year ago

alexk101 commented 1 year ago

Previously, the way in which lists were parsed from dvc templates caused the dict_find_value function to end early without continuing to parse the rest of the template once it encountered a list. Additionally, the structure of vega-lite plots when using lists allows for the case of a list of strings. Because of this, when called recursively as was done in https://github.com/iterative/dvc-render/commit/635eaff2784f84302f9d354f763276740da246f3 , the function may fail when it encounters something like a list of strings, as it tries to access the values of d, which is no longer a dictionary, but a string. I have fixed this by allowing for d to be a string, but this may not be desired/sensible since the function is called dict_find_value. It may make sense to change the name of the function or to call another function to handle the string case.

I include a template which previously would not parse as reference for the changes made.

{
    "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
    "repeat": ["quintile"],
    "config": {"view": {"stroke": "transparent"}},
    "spec": {
        "title": {"text": "<DVC_METRIC_TITLE>", "anchor": "middle"},
        "encoding": {
            "column": {"field": "category", "header": {"orient": "bottom"}},
            "row": {"field": "quintile", "title": "Quintile"},
            "y": {"field": "<DVC_METRIC_Y>", "type": "quantitative"},
            "x": {"field": "<DVC_METRIC_X>", "axis": null},
            "color": {"field": "<DVC_METRIC_X>"}
            },
        "mark": "bar",
        "data": {
            "values": "<DVC_METRIC_DATA>"
        }
    }
}
shcheklein commented 1 year ago

Thanks for the fix @alexk101 ! It looks reasonable to me, we can keep the name also (it's still trying to find a name).

Can we add some tests for this please?

dberenbaum commented 1 year ago

@alexk101 Thanks for catching this and the changes look good! Since I was also working on this recently (causing this bug), my feeling was that this felt a bit complicated just to check for the anchor, and it feels even more complicated now.

An alternative approach would be to search the raw text for <DVC_METRIC_DATA> before parsing it as json. It looks like the template initially gets read here:

https://github.com/iterative/dvc-render/blob/635eaff2784f84302f9d354f763276740da246f3/src/dvc_render/vega_templates.py#L691-L693

You could replace that with something like:

        with _open(template_path, encoding="utf-8") as f:
            raw = f.read()
        content = json.loads(raw)
        template = Template(content, name=template)
        anchor = template.anchor("data")
        if anchor not in raw:
            raise BadTemplateError(
                f"Template '{template.name}' "
                f"is not using '{anchor}' anchor"
            )
        return template

Then we could drop dict_find_value and the validation logic that uses it.

Let me know if that makes sense. The approach you have also works if we have some better tests to avoid a bug like the one I introduced.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 86.36% and project coverage change: -0.24 :warning:

Comparison is base (635eaff) 96.26% compared to head (fb6631e) 96.03%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #134 +/- ## ========================================== - Coverage 96.26% 96.03% -0.24% ========================================== Files 19 19 Lines 777 781 +4 Branches 125 129 +4 ========================================== + Hits 748 750 +2 - Misses 16 17 +1 - Partials 13 14 +1 ``` | [Impacted Files](https://app.codecov.io/gh/iterative/dvc-render/pull/134?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative) | Coverage Δ | | |---|---|---| | [src/dvc\_render/vega\_templates.py](https://app.codecov.io/gh/iterative/dvc-render/pull/134?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative#diff-c3JjL2R2Y19yZW5kZXIvdmVnYV90ZW1wbGF0ZXMucHk=) | `94.19% <81.25%> (-1.14%)` | :arrow_down: | | [tests/test\_templates.py](https://app.codecov.io/gh/iterative/dvc-render/pull/134?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative#diff-dGVzdHMvdGVzdF90ZW1wbGF0ZXMucHk=) | `100.00% <100.00%> (ø)` | | | [tests/test\_vega.py](https://app.codecov.io/gh/iterative/dvc-render/pull/134?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative#diff-dGVzdHMvdGVzdF92ZWdhLnB5) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

daavoo commented 1 year ago

Then we could drop dict_find_value and the validation logic that uses it.

Actually, I think that we should be validating more strictly. We are very opinionated about where the anchor data should be put (inside data->values) but the current validation (and the raw string matching) passes as long as the anchor is somewhere in the template.