microsoft / vscode-data-wrangler

Other
430 stars 19 forks source link

Pandera DataFrame support #317

Open Girmii opened 1 week ago

Girmii commented 1 week ago

Environment data

Expected behaviour

Pandera is a framework to perform data validation/sanitation on Pandas DataFrames, similar to Pydantic for dataclasses. Validation can be performed by casting a Pandas DataFrame to the Pandera DataFrame subclass. As the Pandera DataFrame is a subclasses of the Pandas Dataframe, it should support all functionality of Pandas DataFrames and thus Data Wrangler should also be able to process them.

Actual behaviour

When trying to inspect a Pandera DataFrame with Data Wrangler the following error comes up:

Type 'pandera.typing.pandas.DataFrame' is not currently supported in Data Wrangler. Please open a feature request on the Data Wrangler GitHub repository.

Steps to reproduce:

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame

class ExampleModel(pa.DataFrameModel):
    pass

A = pd.DataFrame(["a", "b", "c"])

B = DataFrame[ExampleModel](A)

One way to bypass the problem is to cast the Pandera DataFrame back to a Pandas DataFrame, after which it can be inspected with Data Wrangler. However, this is quite cumbersome to do every time you want to inspect a Pandera DataFrame.

C = pd.DataFrame(B)
crashdomi commented 2 hours ago

feature would be highly appreciated!