microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

Pandas 2.0.0 breaks pyspark which breaks our tests #694

Closed ksaur closed 1 year ago

ksaur commented 1 year ago

Pandas 2.0 was released recently, and they removed iteritems. Ex: AttributeError: 'DataFrame' object has no attribute 'iteritems'. This causes our tests to fail.

We're not calling iteritems directly (which we need to replace with items), but this call is coming from within pyspark itself when we make a dataframe (others have the same issue with pandas 2.0.0 and pyspark). There doesn't appear to be an immediate fix.

I think we should pin pandas<1.5.3 and wait for a pyspark release; I don't currently see a better way around this.

ksaur commented 1 year ago

Working on this in #695 but will leave this issue open to remember to move pin later.

The fix was merged, but we'll need to wait for pyspark 3.4 to be released (currently at pyspark=3.3.2).