nteract / scrapbook

A library for recording and reading data in notebooks.
https://nteract-scrapbook.readthedocs.io
BSD 3-Clause "New" or "Revised" License
281 stars 26 forks source link

Allow saving of dataframes #59

Closed morganics closed 4 years ago

morganics commented 4 years ago

Surprised that (despite the documentation) support for dataframes doesn't seem to be available - according to the docs you can use the 'arrow' format, but in the code there are a couple of exceptions stating that arrow support is not currently available. I've just used the JSON datatype to save, but obviously not good for larger artifacts.

jerrylam commented 4 years ago

If you look at this file: https://github.com/nteract/scrapbook/blob/master/scrapbook/encoders.py arrow is commented out for some reasons.

MSeal commented 4 years ago

Yes today you have to convert to/from json which has lots of problems.

This PR: https://github.com/nteract/scrapbook/pull/37 adds pandas and arrow dataframe support, but I had put it on hold for other work. I've wrapped up those other tasks so this is like the next PR / feature I will be working on in the near future to get released.

choldgraf commented 4 years ago

Just a quick note here - couldn't this be relatively quickly solved by using an encoder such as this:

class DataFrameEncoder(object):
    def encode(self, scrap):
        # scrap.data is any type, usually specific to the encoder name
        scrap = scrap._replace(data=scrap.data.to_dict())
        return scrap

    def decode(self, scrap):
        # scrap.data is one of [None, list, dict, *six.integer_types, *six.string_types]
        scrap = scrap._replace(data=pd.DataFrame.from_dict(scrap.data))
        return scrap

encoder_registry.register('pandas', DataFrameEncoder())

This allows me to do sb.glue('mydf', 'df', 'pandas')

Maybe not very edge-casey and elegant, but could be a start?

MSeal commented 4 years ago

This functionality was merged to master last week, just not released yet -- it uses pyarrow to encode the dataframe in master which is a little better than the to_dict and from_dict.

MSeal commented 4 years ago

In this PR https://github.com/nteract/scrapbook/pull/62 (closing as the issue should be resolved)