snowflakedb / snowflake-ml-python

Apache License 2.0
37 stars 7 forks source link

Adding Method inverse_transform() signature #79

Open RahulDubey391 opened 7 months ago

RahulDubey391 commented 7 months ago

Hi, I have added the inverse_transform() class methods for the BaseTransformer. What I understood is the method from Sklearn library is overloaded with different signature.

I need more help to understand if my assumption is correct. Also I see all the other preprocessing classes needs to have this method added.

sfc-gh-thoyt commented 6 months ago

@RahulDubey391 Thank you for your efforts. The main thing missing is the implementation of inverse_transform using snowpark dataframes.

Our codebase has a system for autogenerating scikit-learn wrapper classes that allow users to execute fits/transforms/etc with snowpark dataframes or pandas dataframes. For inverse_transform we would like to implement it in this way, and add it to wrapper classes if the underlying scikit-learn base estimator has an inverse_transform method. Take a look at the codegen/ directory for the templates and autogen logic. The autogenerated code is built via bazel.

We realize this could be daunting, but we'd like to support you if you are still interested in contributing. We could greatly improve the contribution guide in CONTRIBUTING.md.

The other thing about inverse_transform is that when using snowpark dataframes for transformations, the input columns are retained by default. The user may want to discard them and can set drop_input_cols=True. But by default since the input columns are retained, there is limited utility for inverse_transform.

Hope this makes sense, let us know if you have questions.

RahulDubey391 commented 6 months ago

@RahulDubey391 Thank you for your efforts. The main thing missing is the implementation of inverse_transform using snowpark dataframes.

Our codebase has a system for autogenerating scikit-learn wrapper classes that allow users to execute fits/transforms/etc with snowpark dataframes or pandas dataframes. For inverse_transform we would like to implement it in this way, and add it to wrapper classes if the underlying scikit-learn base estimator has an inverse_transform method. Take a look at the codegen/ directory for the templates and autogen logic. The autogenerated code is built via bazel.

We realize this could be daunting, but we'd like to support you if you are still interested in contributing. We could greatly improve the contribution guide in CONTRIBUTING.md.

The other thing about inverse_transform is that when using snowpark dataframes for transformations, the input columns are retained by default. The user may want to discard them and can set drop_input_cols=True. But by default since the input columns are retained, there is limited utility for inverse_transform.

Hope this makes sense, let us know if you have questions.

Thanks a lot @sfc-gh-thoyt for the guidance. Yes I am still interested in contributing and would like to proceed further to explore codegen. I'll go through it and will raise doubts if any comes up.