Pretty printing: `AbstractDataset.__repr__`

ElenaKhaustova commented 3 days ago

Description

Implement __repr__ for AbstractDataset for better dataset representation and printing and further use it within DataCatalog.__repr__

We already have an implementation of __str__ method for AbstractDataset based on the dataset's _describe which can be adjusted and moved to __repr__.
Update _describe for MemoryDataset, LambdaDataset, SharedMemoryDataset, and CachedDataset if needed.
One of the potential solutions is to extend the built-in pprint.PrettyPrinter.

ElenaKhaustova commented 1 day ago

I've prototyped two different approaches for printing:

both approaches implement __repr__() method for AbstractDataset based on the implementation of _describe() method for the specific dataset. Since the __str__ method is not implemented __repr__() is called when converting object to string, so we have the same results when print(obj),obj.
https://github.com/kedro-org/kedro/pull/3990 - the first approach prints datasets in one line in a format module.Class(arg1=val_1, ..., arg_n=val_n) where argument values are formatted with pprint.pformat and then joined as strings, so we construct the resulting string from formatted strings.

Screenshot 2024-07-05 at 00 27 12

https://github.com/kedro-org/kedro/pull/3992 - the second approach represents dataset in dict format {'module.Class': {'arg_1:' val_1}, ..., {}}, so at first the end dictionary is created and then it is formatted with pprint.pformat thus we can control indentation but it looks less compact.

Screenshot 2024-07-05 at 00 16 11

for both approaches, we can control depth, so we can hide arguments after a certain indentation level
they also both look representative enough compared to what we had before but the first is more compact, while the second keeps the indentation
an alternative option can be exposing _pretty_repr and providing some level of customisation to user to set up width, indentation, depth

I'm curious about what you think. Does it feel good enough? Do we want more or less information provided? Which approach seems better?

noklam commented 21 hours ago