Closed saulshanabrook closed 1 year ago
@saulshanabrook I have a question, that looks related to your post - When using jupyterlab-data-explorer, how can I convert a pandas dataframe to json in the notebook, so that I can use the different graphic options of nteract's data-explorer? As an example, see here the example jupyterlab-data-explorer notebook, where you have the different graphical options in the red rectangle. On the otherhand, this graphical options are not available for plain dataframes (example in my personal notebook). How can I make them available?
@Nestak2 You have to set pandas.set_option('display.html.table_schema', True)
so that it outputs the JSON, like in this examples.
@saulshanabrook Thank you very much, I didn't know the purpose of this line, now the graphical features are there!
Great! Glad it's working for you. I have added this to the usage docs to hopefully make it more clear in the future: https://github.com/jupyterlab/jupyterlab-data-explorer/pull/135
FWIW,
Odo migrates data using network of small data conversion functions between type pairs. That network is below: odo conversions
Each node is a container type (like pandas.DataFrame or sqlalchemy.Table) and each directed edge is a function that transforms or appends one container into or onto another. We annotate these functions/edges with relative costs.
This network approach allows odo to select the shortest path between any two types (thank you networkx). For performance reasons these functions often leverage non-Pythonic systems like NumPy arrays or native CSV->SQL loading functions. Odo is not dependent on only Python iterators.
BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.
CSVW would be ideal for tabular data (with Linked Data metadata about the dataset and each column). More about this here: "Linked Data formats, tools, challenges, opportunities; CSVW, schema.org/Dataset, schema.org/ScholarlyArticle" https://discuss.ossdata.org/t/linked-data-formats-tools-challenges-opportunities-csvw-schema-org-dataset-schema-org-scholarlyarticle/160
@ellisonbg mentioned that it would be good to support some default tabular data formats, to convert between them.
CSV stringDoneJSON table schemaDoneFor each of these, we should define a data type, and define converters between them. Then we should make sure they work on some test datasets.
Some pipelines that should work after this:
file:///notebooks/Table.ipynb#/cells/4/outputs/0/data/application/vnd.dataresource+json
, then this should use the pandas output from that cell in the notebook as an input to the vega spec. Depends on https://github.com/jupyterlab/jupyterlab-data-explorer/issues/20