philip-ndikum / TemporalScope

TemporalScope: Scientifically driven Model-Agnostic Temporal Feature Importance Analysis with SHAP & partioning algorithms (supporting Pandas, Polars, Modin, PyArrow, Dask).
Apache License 2.0
10 stars 0 forks source link

[Roadmap]: Fix Type Casting in SingleStepTargetShifter for Backend Compatibility #33

Closed philip-ndikum closed 2 days ago

philip-ndikum commented 4 days ago

Is your request related to a feature in the roadmap? Please specify. This relates to the target shifter implementation in MODE_SINGLE_STEP, specifically the SingleStepTargetShifter class which provides scikit-learn style interface for temporal target shifting.

Describe the problem or enhancement you'd like to address The SingleStepTargetShifter is experiencing type casting errors when working with different DataFrame backends through Narwhals. Specifically:

  1. Unit tests are failing with TypeError: issubclass() arg 1 must be a class
  2. The error occurs in the _get_row_count method when casting count to 'int64'
  3. This affects the core functionality of target shifting in single-step mode

The issue impacts the workflow:

  1. Load data into TimeFrame
  2. Do target shifting (before/after partitioning)
  3. Do partitioning
  4. Apply Temporal SHAP algorithms
  5. Visualize and interpret

Describe the solution you'd like The solution should:

  1. Fix the type casting issue while maintaining backend compatibility
  2. Ensure the target shifter works seamlessly with all supported DataFrame backends (pandas, modin, pyarrow, polars, dask)
  3. Preserve the scikit-learn style interface for ease of use
  4. Allow target shifting to work both before and after partitioning algorithms are applied

Technical approach:

  1. Modify the _get_row_count method to handle type casting correctly through Narwhals
  2. Add proper type validation to ensure consistent behavior across backends
  3. Maintain compatibility with the TimeFrame interface

Describe alternatives you've considered

  1. Using native backend type casting instead of Narwhals operations

    • Pro: Direct control over type casting
    • Con: Loses backend agnostic benefits
  2. Removing explicit type casting

    • Pro: Simpler implementation
    • Con: May lead to type inconsistencies across backends
  3. Using a different approach to row counting

    • Pro: Could avoid type casting altogether
    • Con: May not be as efficient or maintainable

Additional context

MarcoGorelli commented 3 days ago

hey, thanks for trying out Narwhals

Unit tests are failing with TypeError: issubclass() arg 1 must be a class

this sounds like it might be a bug in Narwhals - if you could come up with a reproducible example I can take a look 🙏

philip-ndikum commented 3 days ago

Ah fantastic thank you @MarcoGorelli. Your responsiveness is truly appreciated! I'll try to do this as soon as I can in the narwhals repo. Thank you! CC @kanenorman for some of the bugs we've observed.