vinostroud / nfl_analytics

MIT License
0 stars 0 forks source link

initial pytests for schema, load_data, get_epa_down1 #24

Closed vinostroud closed 5 months ago

vinostroud commented 6 months ago

So...first of all I know you've done two code reviews this month, so I am just pushing this as I make progress. Strangely, these tests were going great and everything was working, but then when I rechecked things at the end I started getting a new error that I don't understand (StackOverflow suggests removing __init__.py, that didn't work, I upgraded to latest pandas...not sure why in the matter of minutes my code went from working to not. I think the syntax and tests are generally sound, so I wanted to document progress so far and I'll return to debugging later.


(.conda) kevino@Kevins-Mac-mini NFL_stats % pytest
============================================================================== test session starts ==============================================================================
platform darwin -- Python 3.12.1, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/kevino/Desktop/NFL_stats
configfile: pyproject.toml
plugins: typeguard-4.2.1, mock-3.14.0
collected 0 items / 2 errors                                                                                                                                                    

==================================================================================== ERRORS =====================================================================================
___________________________________________________________________ ERROR collecting tests/test_analytics.py ____________________________________________________________________
ImportError while importing test module '/Users/kevino/Desktop/NFL_stats/tests/test_analytics.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
.conda/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_analytics.py:2: in <module>
    import pandas as pd
.conda/lib/python3.12/site-packages/pandas/__init__.py:138: in <module>
    from pandas import api, arrays, errors, io, plotting, tseries
.conda/lib/python3.12/site-packages/pandas/api/__init__.py:2: in <module>
    from pandas.api import (
.conda/lib/python3.12/site-packages/pandas/api/typing/__init__.py:32: in <module>
    from pandas.io.stata import StataReader
E   ModuleNotFoundError: No module named 'pandas.io.stata'
_____________________________________________________________________ ERROR collecting tests/test_app_fe.py _____________________________________________________________________
ImportError while importing test module '/Users/kevino/Desktop/NFL_stats/tests/test_app_fe.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
.conda/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_app_fe.py:2: in <module>
    from src.app_fe import date_selector
src/app_fe.py:4: in <module>
    import pandas as pd
.conda/lib/python3.12/site-packages/pandas/__init__.py:138: in <module>
    from pandas import api, arrays, errors, io, plotting, tseries
.conda/lib/python3.12/site-packages/pandas/api/__init__.py:2: in <module>
    from pandas.api import (
.conda/lib/python3.12/site-packages/pandas/api/typing/__init__.py:32: in <module>
    from pandas.io.stata import StataReader
E   ModuleNotFoundError: No module named 'pandas.io.stata'
=============================================================================== warnings summary ================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.MessageMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.ScalarMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================ short test summary info ============================================================================
ERROR tests/test_analytics.py
ERROR tests/test_app_fe.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================================= 2 warnings, 2 errors in 0.62s =========================================================================
(.conda) kevino@Kevins-Mac-mini NFL_stats % 
vinostroud commented 6 months ago

And just evidence it was working:

((.conda) kevino@Kevins-Mac-mini NFL_stats % pytest            
============================================================================== test session starts ==============================================================================
platform darwin -- Python 3.12.1, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/kevino/Desktop/NFL_stats
configfile: pyproject.toml
plugins: typeguard-4.2.1, mock-3.14.0
collected 2 items                                                                                                                                                               

tests/test_analytics.py .                                                                                                                                                 [ 50%]
tests/test_app_fe.py .                                                                                                                                                    [100%]

=============================================================================== warnings summary ================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.MessageMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: Type google._upb._message.ScalarMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

.conda/lib/python3.12/site-packages/jupyter_client/connect.py:22
  /Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 2 passed, 3 warnings in 25.60s =========================================================================
vinostroud commented 6 months ago

(commenting here to make a record of errors/fixes)

Seems this is the problem: https://github.com/jupyter/notebook/issues/3798

Also, I've now a separate modulenotfound error: dateutil.tz. This happens if I try to run the stream lit app.

I wonder if when I installed pandera this added these dependencies?

File "/Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script exec(code, module.dict) File "/Users/kevino/Desktop/NFL_stats/src/app_fe.py", line 4, in import pandas as pd File "/Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/pandas/init.py", line 49, in from pandas.core.api import ( File "/Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/pandas/core/api.py", line 1, in from pandas._libs import ( File "/Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/pandas/_libs/init.py", line 18, in from pandas._libs.interval import Interval File "interval.pyx", line 1, in init pandas._libs.interval File "hashtable.pyx", line 1, in init pandas._libs.hashtable File "missing.pyx", line 1, in init pandas._libs.missing File "/Users/kevino/Desktop/NFL_stats/.conda/lib/python3.12/site-packages/pandas/_libs/tslibs/init.py", line 40, in from pandas._libs.tslibs.conversion import localize_pydatetime File "conversion.pyx", line 1, in init pandas._libs.tslibs.conversion File "offsets.pyx", line 1, in init pandas._libs.tslibs.offsets File "timestamps.pyx", line 1, in init pandas._libs.tslibs.timestamps File "timedeltas.pyx", line 1, in init pandas._libs.tslibs.timedeltas File "timezones.pyx", line 24, in init pandas._libs.tslibs.timezones

bbelderbos commented 6 months ago

Great way to document the problem, multi comments on an issue, I do that often too.

Actually on my end the tests hang right now.

So I found a good ol' friend:

https://pybit.es/articles/pytest-timeout/

$ poetry add pytest-timeout

But yeah different issue then on my end:

image
vinostroud commented 6 months ago

Great way to document the problem, multi comments on an issue, I do that often too.

Actually on my end the tests hang right now.

So I found a good ol' friend:

https://pybit.es/articles/pytest-timeout/

$ poetry add pytest-timeout

But yeah different issue then on my end:

image

So that is so strange -- it would be good news if I had failing tests (although again they all passed at one point!) -- but I can't even get that far because my pytest failes because of missing (and deprecated -- so I am not using them) modules.

My best guess is that when I did poetry add pandera something changed ...somewhere ... and my package manager is looking for modules that don't exist and that it was previously ignoring.

bbelderbos commented 6 months ago

Can you isolate the dependency change by doing git checkout previous-commit-without-dependency, poetry install (to I assume remove the dependency from poetry's virtual env), and then poetry run pytest again?

bbelderbos commented 6 months ago

You could also try to git clone the repo to a new directory and poetry install it, then run the tests, to see if it makes a difference.

bbelderbos commented 6 months ago

The screenshot is not rendering for me, what's the error?

vinostroud commented 6 months ago

The screenshot that is not rendering is the one you provided (it doesn't render in a quoted reply).

bbelderbos commented 6 months ago

Did you pinpoint what dependency change caused this? Did you clone the repo to another directory fresh and check after poetry install?

vinostroud commented 5 months ago

It stopped working after I added Pandera, is the best guess I have.

When I do a new clone and poetry install, I get this message. This seems unrelated:

Installing the current project: nfl data analysis (0.1.0)Warning: The current project could not be installed: [Errno 2] No such file or directory: '/Users/kevino/Desktop/NFL_Stats_new_repo/nfl_analytics/README.md'If you do not want to install the current project use --no-root.If you want to use Poetry only for dependency management but not for packaging, you can disable package mode by setting package-mode = false in your pyproject.toml file.In a future version of Poetry this warning will become an error!

On Thu, May 30, 2024 at 6:43 AM Bob Belderbos @.***> wrote:

Did you pinpoint what dependency change caused this? Did you clone the repo to another directory fresh and check after poetry install?

— Reply to this email directly, view it on GitHub https://github.com/vinostroud/nfl_analytics/pull/24#issuecomment-2139275495, or unsubscribe https://github.com/notifications/unsubscribe-auth/BADYLRH2GPAZWSLRU3EARPTZE37D5AVCNFSM6AAAAABIJ6AJHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGI3TKNBZGU . You are receiving this because you authored the thread.Message ID: @.***>

bbelderbos commented 5 months ago

Yes I get that too:

√ experiment  $ git clone git@github.com:vinostroud/nfl_analytics.git
Cloning into 'nfl_analytics'...
cd nfremote: Enumerating objects: 94, done.
remote: Counting objects: 100% (94/94), done.
remote: Compressing objects: 100% (67/67), done.
remote: Total 94 (delta 41), reused 65 (delta 25), pack-reused 0
Receiving objects: 100% (94/94), 153.39 KiB | 797.00 KiB/s, done.
Resolving deltas: 100% (41/41), done.
√ experiment  $ cd nfl_analytics
√ nfl_analytics (main) $ poetry install
Creating virtualenv nfl_data_analysis-xoqXCCb0-py3.12 in /Users/bbelderbos/Library/Caches/pypoetry/virtualenvs
Installing dependencies from lock file

Package operations: 94 installs, 1 update, 0 removals

  • Updating pip (23.3.2 -> 24.0)
  • Installing attrs (23.2.0)
...
  • Installing streamlit (1.34.0)

Installing the current project: nfl data analysis (0.1.0)
The current project could not be installed: [Errno 2] No such file or directory: '/Users/bbelderbos/code/experiment/nfl_analytics/README.md'
If you do not want to install the current project use --no-root

But I can install it fine.

bbelderbos commented 5 months ago

Branch install works too for me:

?1 nfl_analytics (main) $ git checkout 23-testing-branch
branch '23-testing-branch' set up to track 'origin/23-testing-branch'.
Switched to a new branch '23-testing-branch'
√ nfl_analytics (23-testing-branch) $ poetry install

Installing dependencies from lock file

Package operations: 9 installs, 0 updates, 0 removals

  • Installing annotated-types (0.7.0)
  • Installing mypy-extensions (1.0.0)
  • Installing pydantic-core (2.18.2)
  • Installing multimethod (1.10)
  • Installing pydantic (2.7.1)
  • Installing typeguard (4.2.1)
  • Installing typing-inspect (0.9.0)
  • Installing wrapt (1.16.0)
  • Installing pandera (0.19.3)

Installing the current project: nfl data analysis (0.1.0)
The current project could not be installed: [Errno 2] No such file or directory: '/Users/bbelderbos/code/experiment/nfl_analytics/README.md'
If you do not want to install the current project use --no-root
√ nfl_analytics (23-testing-branch) $ poetry --version
Poetry (version 1.7.1)

Please add a README ...

bbelderbos commented 5 months ago

At that point the tests hang:

?1 nfl_analytics (23-testing-branch) $ poetry run pytest
========================================================================================================= test session starts =========================================================================================================
platform darwin -- Python 3.12.1, pytest-8.2.1, pluggy-1.5.0
rootdir: /Users/bbelderbos/code/experiment/nfl_analytics
configfile: pyproject.toml
plugins: typeguard-4.2.1, mock-3.14.0
collecting ...
bbelderbos commented 5 months ago

No hang, it just takes a while (about 66 seconds) but I do get an error:

tests/test_analytics.py F.F                                                                                                                                                                                                     [ 75%]
tests/test_app_fe.py .                                                                                                                                                                                                          [100%]

============================================================================================================== FAILURES ===============================================================================================================
________________________________________________________________________________________________________ test_validate_columns ________________________________________________________________________________________________________

    def test_validate_columns():
>       df = nfl.import_pbp_data()
E       TypeError: import_pbp_data() missing 1 required positional argument: 'years'

tests/test_analytics.py:18: TypeError
_______________________________________________________________________________________________________ test_get_mean_epa_down1 _______________________________________________________________________________________________________

    def test_get_mean_epa_down1():
        result = get_mean_epa_down1(mock_data)
>       pd.testing.assert_frame_equal(result, expected_output)
E       AssertionError: DataFrame are different
E
E       DataFrame shape mismatch
E       [left]:  (1, 3)
E       [right]: (2, 3)

tests/test_analytics.py:72: AssertionError
vinostroud commented 5 months ago

I am also getting this error when I do 'poetry' [anything]

Poetry could not find a pyproject.toml file in /Users/kevino/Desktop/NFL_Stats_new_repo or its parents. I can confirm there is a toml file in my new directory, so I don't understand why this error would come up.

I am trying to think what makes sense to do next. I am spending a lot of time just running into new errors and not understanding how I can identify and debug the original error. It's tempting to just nuke the repository and start over with a fresh environment.

Edit: Here's my vote -- this is kinda issue that got me stuck for days or weeks during PDM. It would be very helpful next time we meet to do a detailed walkthrough of how to debug this kind of issue and maybe understand of where to look and how to roll back changes (or whatever the practice).

How I'll make progress in the meantime: There's some other functions I want to add to the code. I can do this on my jupyter notebook which, while ugly as far as code quality, is easy to manipulate. I can work through the new functionality and then once my package is working as expected I can introduce the new functions.

Also I am enjoying doing the pybites so I'll work through those as well and ask questions as they come up (which won't need answers until we speak.

On Mon, Jun 3, 2024 at 5:57 AM Bob Belderbos @.***> wrote:

No hang, it just takes a while (about 66 seconds) but I do get an error:

tests/test_analytics.py F.F [ 75%] tests/test_app_fe.py . [100%]

============================================================================================================== FAILURES =============================================================================================================== ____ test_validate_columns ____

def test_validate_columns():
  df = nfl.import_pbp_data()

E TypeError: import_pbp_data() missing 1 required positional argument: 'years'

tests/test_analytics.py:18: TypeError _ test_get_mean_epadown1

def test_get_mean_epa_down1():
    result = get_mean_epa_down1(mock_data)
  pd.testing.assert_frame_equal(result, expected_output)

E AssertionError: DataFrame are different E E DataFrame shape mismatch E [left]: (1, 3) E [right]: (2, 3)

tests/test_analytics.py:72: AssertionError

— Reply to this email directly, view it on GitHub https://github.com/vinostroud/nfl_analytics/pull/24#issuecomment-2144779966, or unsubscribe https://github.com/notifications/unsubscribe-auth/BADYLRG3C3H3XDQ6U3ALGIDZFQ4Y5AVCNFSM6AAAAABIJ6AJHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBUG43TSOJWGY . You are receiving this because you authored the thread.Message ID: @.***>

bbelderbos commented 5 months ago

@vinostroud can we meet tomorrow or Wednesday quickly to debug this in your environment? You screenshare and I guide you through it?

bbelderbos commented 5 months ago

Tomorrow is quite busy, I sent you an invite for Wednesday, is your 9 am ok or better later?

vinostroud commented 5 months ago

Wednesday works great, thank you.

I realize this is out of cycle and assume we'll just push back our regularly scheduled meeting to the normal cadence. I do not expect any special treatment :)

On Mon, Jun 3, 2024 at 1:28 PM Bob Belderbos @.***> wrote:

Tomorrow is quite busy, I sent you an invite for Wednesday, is your 9 am ok or better later?

— Reply to this email directly, view it on GitHub https://github.com/vinostroud/nfl_analytics/pull/24#issuecomment-2145755979, or unsubscribe https://github.com/notifications/unsubscribe-auth/BADYLRBDN7A53ZYXZGCZAR3ZFSRSDAVCNFSM6AAAAABIJ6AJHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVG42TKOJXHE . You are receiving this because you were mentioned.Message ID: @.***>

bbelderbos commented 5 months ago

No worries, next checkin is on as well, but maybe we meet 30 min instead of 45 min then. What about making it a biweekly shorter checkin as well?

vinostroud commented 5 months ago

I think that's a great idea! Shorter checkins are good and that way as I ask question in threads you can save time and we just discuss when we meet in person.

On Tue, Jun 4, 2024 at 3:31 AM Bob Belderbos @.***> wrote:

No worries, next checkin is on as well, but maybe we meet 30 min instead of 45 min then. What about making it a biweekly shorter checkin as well?

— Reply to this email directly, view it on GitHub https://github.com/vinostroud/nfl_analytics/pull/24#issuecomment-2146812068, or unsubscribe https://github.com/notifications/unsubscribe-auth/BADYLRCPKZBZ3IW2AQUREJ3ZFVUNRAVCNFSM6AAAAABIJ6AJHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBWHAYTEMBWHA . You are receiving this because you were mentioned.Message ID: @.***>

bbelderbos commented 5 months ago

OK I updated our recurring meeting like that, thanks.

vinostroud commented 5 months ago

Closing this for now as I am moving to tracking issues in issue 26.