Open antonymilne opened 2 years ago
One other (likely unnecessary) discrepancy is that PartitionedDataSet
doesn't currently support versioning--either of the overarching or underlying dataset. MatplotlibWriter
does for the overarching dataset. Perhaps relevant discussions, although more focused on the underlying dataset: https://github.com/kedro-org/kedro/pull/521.
@deepyaman thanks, that is a very pertinent point given that experiment tracking is one of the main motivations here, and that directly relies on versioned datasets to work... So if we were to move to PartitionedDataSet
for MatplotlibWriter
then we should try and get kedro-org/kedro#521 done.
MatplotlibWriter
currently supports 3 different save modes:plt.figure
to a png fileList[plt.figure]
to multiple png files (labelled 0.png, 1.png, etc.)Dict[str, plt.figure]
to multiple png files (labelled by dictionary keys)There's a recently-added
overwrite
option associated with the latter two modes (https://github.com/kedro-org/kedro/issues/868). This also exists forPartitionedDataSet
.The current behaviour has some problems:
On the other hand, the ability to save multiple plots rather than define one dataset per plot is essential. I have used it myself many times and seen it used a lot.
So, my question is: should we replace the matplotlib save modes that do multiple plots with instead wrapping
MatplotlibWriter
inPartionedDataSet
? Leaving aside how we do this technically for the moment, would this be a good change to make? i.e. will this be a user-friendly solution here? Will it allow everything we need to allow in terms of functionality?My suspicion is that the only reason we don't already use
PartionedDataSet
for this is historical (MatplotlibWriter
was added to contrib at the same timePartionedDataSet
was added to core).Tagging @Galileo-Galilei who I suspect will just have the answers here 😀