zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
49.39k stars 3k forks source link

Better dataframe formatting in REPL #15555

Open universalmind303 opened 3 months ago

universalmind303 commented 3 months ago

Check for existing issues

Describe the feature

When you display a dataframe via the REPL, the formatting is very bad. I tried this with multiple dataframe libraries (polars, daft, pandas). They should use _repr_html_ as is customary with jupyter notebooks, but it looks like instead they are using the __repr__ method.

The __repr__ method would be fine if it was properly aligned as it is in vscode

If applicable, add mockups / screenshots to help present your vision of the feature

Zed

image

VSCode

image

VSCode __repr__

image

rgbkrk commented 3 months ago

I haven't documented this yet, but you can use the following to get a richer table:

pd.set_option('display.html.table_schema', True)

Zed/GPUI does not support HTML output at this time so we are using the JSON Schema output for now.

universalmind303 commented 3 months ago

@rgbkrk for dataframe objects, how exactly are they serialized to the repl?

i saw that there is the rank_mime_type that'll use DataTable if possible, but wasn't able to figure out how is that mime type determined from the pyobject?

Is there a dunder method or other _repr_<something>_ the dataframe libraries could add to make this auto detect?

AFAIK, only pandas supports the afformentioned option for display.html.table_schema

universalmind303 commented 3 months ago

It also looks like data tables are truncating the last column.

https://github.com/user-attachments/assets/0607c16f-40f2-4795-a599-72f185f4e68d

rgbkrk commented 3 months ago

The media type this goes on is application/vnd.dataresource+json. The only _repr*_ way to emit this is to use _repr_mimebundle_. Example:

class DataResource:
    def __init__(self, data):
        self.data = data

    # The media type for this is `application/vnd.dataresource+json`.
    # The only `_repr*_` method to emit this is to use `_repr_mimebundle_`.
    # For more details, refer to: https://ipython.readthedocs.io/en/stable/config/integrating.html
    def _repr_mimebundle_(self, include=None, exclude=None):
        return {
            'application/vnd.dataresource+json': self.data
        }

# Example usage
data = {
    "schema": {
        "fields": [
            {"name": "name", "type": "string"},
            {"name": "age", "type": "integer"}
        ]
    },
    "data": [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25}
    ]
}

resource = DataResource(data)
resource 

You can also use _ipython_display_, but that's the repr way.