jupyterlab / jupyterlab-data-explorer

First class datasets in JupyterLab
BSD 3-Clause "New" or "Revised" License
178 stars 38 forks source link

Add converters for tabular data #63

Closed saulshanabrook closed 1 year ago

saulshanabrook commented 5 years ago

@ellisonbg mentioned that it would be good to support some default tabular data formats, to convert between them.

For each of these, we should define a data type, and define converters between them. Then we should make sure they work on some test datasets.

Some pipelines that should work after this:

  1. Open CSV files with nteract data viewer, by first converting to JSON table schema
  2. View pandas dataframe output in datagrid, by going from JSON table schema to datagrid model
  3. If we create a Vega Lite spec that refers to a dataset by url like file:///notebooks/Table.ipynb#/cells/4/outputs/0/data/application/vnd.dataresource+json, then this should use the pandas output from that cell in the notebook as an input to the vega spec. Depends on https://github.com/jupyterlab/jupyterlab-data-explorer/issues/20
Nestak2 commented 4 years ago

@saulshanabrook I have a question, that looks related to your post - When using jupyterlab-data-explorer, how can I convert a pandas dataframe to json in the notebook, so that I can use the different graphic options of nteract's data-explorer? As an example, see here the example jupyterlab-data-explorer notebook, where you have the different graphical options in the red rectangle. On the otherhand, this graphical options are not available for plain dataframes (example in my personal notebook). How can I make them available?

saulshanabrook commented 4 years ago

@Nestak2 You have to set pandas.set_option('display.html.table_schema', True) so that it outputs the JSON, like in this examples.

Nestak2 commented 4 years ago

@saulshanabrook Thank you very much, I didn't know the purpose of this line, now the graphical features are there!

saulshanabrook commented 4 years ago

Great! Glad it's working for you. I have added this to the usage docs to hopefully make it more clear in the future: https://github.com/jupyterlab/jupyterlab-data-explorer/pull/135

westurner commented 4 years ago

FWIW,

CSVW would be ideal for tabular data (with Linked Data metadata about the dataset and each column). More about this here: "Linked Data formats, tools, challenges, opportunities; CSVW, schema.org/Dataset, schema.org/ScholarlyArticle" https://discuss.ossdata.org/t/linked-data-formats-tools-challenges-opportunities-csvw-schema-org-dataset-schema-org-scholarlyarticle/160