marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

GIL issue thrown when testing pip install on macOS workflow #36

Closed JasonMoho closed 2 years ago

JasonMoho commented 3 years ago

Describe the bug MacOS pip install test throwing GIL error even though all tests pass: https://github.com/marius-team/marius/runs/2401968116

Could be an issue with Python 3.9 since the linux workflow passes but uses Python 3.8. Possibly related to https://github.com/pytorch/pytorch/issues/49370

Output:

2021-04-21T16:01:51.7260960Z ##[group]Run python3 -c "import marius as m"
2021-04-21T16:01:51.7261640Z python3 -c "import marius as m"
2021-04-21T16:01:51.7262230Z python3 -c "from marius.tools import preprocess"
2021-04-21T16:01:51.7262850Z marius_preprocess fb15k output_dir/
2021-04-21T16:01:51.7263760Z pytest test
2021-04-21T16:01:51.8917040Z shell: /bin/bash --noprofile --norc -e -o pipefail {0}
2021-04-21T16:01:51.8917570Z env:
2021-04-21T16:01:51.8918010Z   BUILD_TYPE: Release
2021-04-21T16:01:51.8918430Z ##[endgroup]
2021-04-21T16:02:03.4541320Z fb15k
2021-04-21T16:02:03.4642510Z Downloading fb15k.tgz to output_dir/fb15k.tgz
2021-04-21T16:02:03.4658930Z Extracting
2021-04-21T16:02:03.4659870Z Extraction completed
2021-04-21T16:02:03.4660660Z Detected delimiter:    
2021-04-21T16:02:03.4662650Z Reading in output_dir/freebase_mtr100_mte100-train.txt   1/3
2021-04-21T16:02:03.4664160Z Reading in output_dir/freebase_mtr100_mte100-valid.txt   2/3
2021-04-21T16:02:03.4665790Z Reading in output_dir/freebase_mtr100_mte100-test.txt   3/3
2021-04-21T16:02:03.4666760Z Number of instance per file:[483142, 50000, 59071]
2021-04-21T16:02:03.4667560Z Number of nodes: 14951
2021-04-21T16:02:03.4668370Z Number of edges: 592213
2021-04-21T16:02:03.4669180Z Number of relations: 1345
2021-04-21T16:02:03.4670000Z Delimiter: ~   ~
2021-04-21T16:02:05.0357020Z ============================= test session starts ==============================
2021-04-21T16:02:05.0358980Z platform darwin -- Python 3.9.4, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
2021-04-21T16:02:05.0360090Z rootdir: /Users/runner/work/marius/marius
2021-04-21T16:02:05.0360930Z collected 29 items
2021-04-21T16:02:05.0361460Z 
2021-04-21T16:04:46.3756720Z test/python/bindings/test_fb15k.py .                                     [  3%]
2021-04-21T16:04:46.4321450Z test/python/preprocessing/test_config_generator_cmd_opt_parsing.py ..... [ 20%]
2021-04-21T16:04:47.7820700Z .........                                                                [ 51%]
2021-04-21T16:04:47.8108760Z test/python/preprocessing/test_csv_preprocessor.py .                     [ 55%]
2021-04-21T16:04:59.0886020Z test/python/preprocessing/test_preprocess_cmd_opt_parsing.py ........... [ 93%]
2021-04-21T16:04:59.1086690Z ..                                                                       [100%]
2021-04-21T16:04:59.1171760Z 
2021-04-21T16:04:59.1204890Z ======================== 29 passed in 175.06s (0:02:55) ========================
2021-04-21T16:04:59.2552200Z Fatal Python error: PyEval_SaveThread: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
2021-04-21T16:04:59.2652700Z Python runtime state: finalizing (tstate=0x7fe41c409b50)
2021-04-21T16:04:59.2754080Z 
2021-04-21T16:04:59.2856250Z /Users/runner/work/_temp/511be060-bb2e-418a-ac5e-2e0f5d09f4d7.sh: line 4:  5232 Abort trap: 6           pytest test

To Reproduce Run the macOS pip install test workflow

Expected behavior The pip install works fine on linux:

2021-04-21T15:50:21.1538556Z python3 -c "import marius as m"
2021-04-21T15:50:21.1539213Z python3 -c "from marius.tools import preprocess"
2021-04-21T15:50:21.1539916Z marius_preprocess fb15k output_dir/
2021-04-21T15:50:21.1540448Z pytest test
2021-04-21T15:50:21.1584496Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2021-04-21T15:50:21.1585040Z env:
2021-04-21T15:50:21.1585484Z   BUILD_TYPE: Release
2021-04-21T15:50:21.1586287Z ##[endgroup]
2021-04-21T15:50:26.6729316Z fb15k
2021-04-21T15:50:26.6730578Z Downloading fb15k.tgz to output_dir/fb15k.tgz
2021-04-21T15:50:26.6731334Z Extracting
2021-04-21T15:50:26.6731982Z Extraction completed
2021-04-21T15:50:26.6732836Z Detected delimiter:    
2021-04-21T15:50:26.6734284Z Reading in output_dir/freebase_mtr100_mte100-train.txt   1/3
2021-04-21T15:50:26.6735973Z Reading in output_dir/freebase_mtr100_mte100-valid.txt   2/3
2021-04-21T15:50:26.6738109Z Reading in output_dir/freebase_mtr100_mte100-test.txt   3/3
2021-04-21T15:50:26.6739043Z Number of instance per file:[483142, 50000, 59071]
2021-04-21T15:50:26.6739918Z Number of nodes: 14951
2021-04-21T15:50:26.6740497Z Number of edges: 592213
2021-04-21T15:50:26.6741087Z Number of relations: 1345
2021-04-21T15:50:26.6741661Z Delimiter: ~   ~
2021-04-21T15:50:27.8808863Z ============================= test session starts ==============================
2021-04-21T15:50:27.8811125Z platform linux -- Python 3.8.5, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
2021-04-21T15:50:27.8812170Z rootdir: /home/runner/work/marius/marius
2021-04-21T15:50:27.8812954Z collected 29 items
2021-04-21T15:50:27.8813617Z 
2021-04-21T15:50:50.9462537Z test/python/bindings/test_fb15k.py .                                     [  3%]
2021-04-21T15:50:50.9827642Z test/python/preprocessing/test_config_generator_cmd_opt_parsing.py ..... [ 20%]
2021-04-21T15:50:51.6762691Z .........                                                                [ 51%]
2021-04-21T15:50:51.6988451Z test/python/preprocessing/test_csv_preprocessor.py .                     [ 55%]
2021-04-21T15:50:57.6109717Z test/python/preprocessing/test_preprocess_cmd_opt_parsing.py ........... [ 93%]
2021-04-21T15:50:57.6234674Z ..                                                                       [100%]
2021-04-21T15:50:57.6235430Z 
2021-04-21T15:50:57.6236116Z ============================= 29 passed in 30.61s ==============================

Environment MacOS: platform darwin -- Python 3.9.4, pytest-6.2.3, py-1.10.0, pluggy-0.13.1 Linux: platform linux -- Python 3.8.5, pytest-6.2.3, py-1.10.0, pluggy-0.13.1

Additional context test/python/bindings/test_fb15k.py is the likely culprit for throwing errors since it's the only one which runs the bindings. Unclear why it marks the test as passed.

JasonMoho commented 3 years ago

The issue appears to be non-deterministic, as all the checks for this PR passed: https://github.com/marius-team/marius/pull/35