posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.81k stars 60 forks source link

epic: clean up nanoplot implementation #420

Open machow opened 1 month ago

machow commented 1 month ago

From pairing, we currently have pieces like boxplots in nanoplots teed up (#168). However, when we merged the nanoplot PR, we decided to punt on doing too much review / cleanup, in the name of getting it out.

Let's plan on cleaning up the nanoplot implementation a bit, so it's simpler to add things like boxplots. Below are some potential cleanups.

Avoiding unsetting arguments based on other arguments

We decide what goes into _construct_nanoplot_svg(), but currently it can quietly unset arguments.

https://github.com/posit-dev/great-tables/blob/7365aaa54bfe5616fc1d772d1c39b101e7c3bb77/great_tables/_utils_nanoplots.py#L532-L540

Do not fallback to unspecified behavior

For example, this if/else clause defaults to using a specific calculation (quartile 3).

https://github.com/posit-dev/great-tables/blob/7365aaa54bfe5616fc1d772d1c39b101e7c3bb77/great_tables/_utils_nanoplots.py#L420-L421

(Note that functions outside this one check that a calculation was specified, so in practice things work out okay)

Consolidate large branches inside _generate_nanoplot()

For example, the "Generate curved data line section" has two branches that seem similar to each other.

https://github.com/posit-dev/great-tables/blob/7365aaa54bfe5616fc1d772d1c39b101e7c3bb77/great_tables/_utils_nanoplots.py#L1030-L1034

It seems like if we had functions for curved_path() and polyline() that produced svg strings (or a data repr of SVG elements), that would help a lot. IMO everywhere that the variable data_path_tags gets used gives helpful hints for consolidating.

People can go overboard consolidating, so I think the key is finding the places where it cleans up readability / robustness the most.

rich-iannone commented 1 day ago

In speaking with @t-kalinowski we can do the following to improve this part of the codebase:

  1. split _generate_nanoplot() into several smaller functions; try to reduce line length to ~20 lines
  2. create a nanoplot spec which encapsulates all data and options in a class that initializes the values from input data and collected args
  3. ensure there are reasonable contracts for each of the methods added