scikit-learn-contrib / skdag

A more flexible alternative to scikit-learn Pipelines
MIT License
30 stars 8 forks source link

convert input into dataframe #28

Closed rja172 closed 10 months ago

rja172 commented 1 year ago

Hi , thanks for the amazing library. I am maintainer of sklearn-pandas library and it was always on my todo list to convert it into proper DAG. I was trying skdag and got blocked with one problem. One of my intermediate transformer expects input to be dataframe. Wondering is there any way to force the inputs to be converted into a dataframe.

big-o commented 1 year ago

Can you provide a code snippet that recreates the issue? DAGBuilder has an option that should preserve data frames across steps or coerce non-pandas outputs into data frames if needed:

dag = DAGBuilder(infer_dataframe=True).from_pipeline(
    steps=[
        ("impute", SimpleImputer()),
        ("pca", PCA()),
        ("lr", LogisticRegression())
    ]
).make_dag()

I'm very tempted to make this option the default behaviour but haven't so far because a) we probably shouldn't manipulate data structures unnecessarily unless specified and b) this is technically not a part of the sklearn estimator API, but rather an extension of it. I'm open to discussions on this though.

big-o commented 10 months ago

Closing due to lack of activity