Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet?

antonymilne commented 2 years ago

MatplotlibWriter currently supports 3 different save modes:

save a single plt.figure to a png file
save List[plt.figure] to multiple png files (labelled 0.png, 1.png, etc.)
save Dict[str, plt.figure] to multiple png files (labelled by dictionary keys)

There's a recently-added overwrite option associated with the latter two modes (https://github.com/kedro-org/kedro/issues/868). This also exists for PartitionedDataSet.

The current behaviour has some problems:

it's very weird because it's the only dataset that has multiple save modes possible
(less important because this will still need to be solved on kedro-viz even if we change how it works...) it complicates some things in kedro-viz (#1626 https://github.com/kedro-org/kedro-viz/issues/783)

On the other hand, the ability to save multiple plots rather than define one dataset per plot is essential. I have used it myself many times and seen it used a lot.

So, my question is: should we replace the matplotlib save modes that do multiple plots with instead wrapping MatplotlibWriter in PartionedDataSet? Leaving aside how we do this technically for the moment, would this be a good change to make? i.e. will this be a user-friendly solution here? Will it allow everything we need to allow in terms of functionality?

My suspicion is that the only reason we don't already use PartionedDataSet for this is historical (MatplotlibWriter was added to contrib at the same time PartionedDataSet was added to core).

Tagging @Galileo-Galilei who I suspect will just have the answers here 😀

deepyaman commented 2 years ago

One other (likely unnecessary) discrepancy is that PartitionedDataSet doesn't currently support versioning--either of the overarching or underlying dataset. MatplotlibWriter does for the overarching dataset. Perhaps relevant discussions, although more focused on the underlying dataset: https://github.com/kedro-org/kedro/pull/521.

antonymilne commented 2 years ago

@deepyaman thanks, that is a very pertinent point given that experiment tracking is one of the main motivations here, and that directly relies on versioned datasets to work... So if we were to move to PartitionedDataSet for MatplotlibWriter then we should try and get kedro-org/kedro#521 done.

kedro-org / kedro-plugins

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? #529