[BUG] Movielens not working in python 3.7 due to Pandera library

miguelgfierro commented 8 months ago

Description

In python 3.7, movielens doesn't load due to a conflict with Pandera library:

from recommenders.datasets import movielens
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/recommenders/datasets/movielens.py:[36](https://github.com/LinkedInLearning/recommendation-systems-a-practical-introduction-2703578/actions/runs/7327519315/job/19954500824?pr=24#step:7:37): in <module>
    import pandera as pa
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandera/__init__.py:4: in <module>
    import pandera.backends
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandera/backends/__init__.py:6: in <module>
    import pandera.backends.pandas
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandera/backends/pandas/__init__.py:5: in <module>
    import pandera.typing
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandera/typing/__init__.py:9: in <module>
    from pandera.typing import (
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandera/typing/geopandas.py:5: in <module>
    from typing import (  # type: ignore[attr-defined]
E   ImportError: cannot import name 'get_args' from 'typing' (/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/typing.py)

In which platform does it happen?

How do we replicate the issue?

Pandera recently release 0.18.0 see https://pypi.org/project/pandera/#history

With Pandera < 0.18.0, the code works.

Expected behavior (i.e. solution)

Other Comments

miguelgfierro commented 8 months ago

@loomlike it seems we only use pandera when creating a fake movielens dataset, do you know why we are using that dependency and if we can drop it?

john0isaac commented 6 months ago

This also affects Python 3.8.

miguelgfierro commented 6 months ago

FYI @SimonYansenZhao @anargyri more problems with dependencies

anargyri commented 6 months ago

This is strange because they say that pandera requires Python >=3.7

anargyri commented 6 months ago

Also here

anargyri commented 6 months ago

But maybe the typing dependency is the problem.

john0isaac commented 6 months ago

@anargyri the error @miguelgfierro is sharing is from Python 3.7.17 so technically it is >= 3.7.

Yes, tracing the error goes back to the typing thing. there is a stalk overflow post about two solutions to this problem. https://stackoverflow.com/questions/77247446/cannot-import-name-self-from-typing

anargyri commented 6 months ago

Thanks, so pandera needs to handle this. We need to remove the dependency then.

anargyri commented 6 months ago

This is the original commit that introduced pandera btw https://github.com/recommenders-team/recommenders/commit/fd33efe2d6fc1f736809665c8d5657476374de32

anargyri commented 6 months ago

from recommenders.datasets import movielens

I could only replicate the issue in 3.7, not in 3.8. @john0isaac what machine did you use? mine is a Standard_E4ds_v4 (4 cores, 32 GB RAM, 150 GB disk). What did you do to get this error? I tried

conda create -n foo python=3.8
conda activate foo
pip install recommenders
python
Python 3.8.18 (default, Sep 11 2023, 13:40:15) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from recommenders.datasets import movielens
>>>

john0isaac commented 6 months ago

@anargyri

Serverless Spark compute on Azure ML

This is the machine config: 4 vCPU, 32 GB memory, 64 GB disk

Python 3.8, Scala 2.12.15, Java 1.8.0_282, .Net Core 3.1, .Net for Apache Spark 2.0, Delta Lake 1.2

anargyri commented 6 months ago

Thanks @john0isaac I don't have quota for this type of machine, I am afraid. I should make a commit for Python 3.8 too. What happens if you try with higher Python versions on that machine?

john0isaac commented 6 months ago

@anargyri It's serverless doesn't consume lots of money and enables the spark jobs don't worry about quota.

As for your question as this is a preconfigured compute there are only two options python 3.8 or python 3.10 and python 3.10 is not supported by the recommenders package so, you need to fix this in order for this environment to work on Azure ML.

anargyri commented 6 months ago

I am fixing this in a new PR. Until we release the fix on PyPI, one way to bypass the issue is after you have done pip install recommenders you do a pip install "pandera[strategies]>=0.6.5,<0.18" This should uninstall the latest version of pandera (which is causing the error) and install the previous one.

john0isaac commented 6 months ago

Thanks

recommenders-team / recommenders