scikit-learn = "^1.0" makes integration tests get stuck

leoisl commented 1 year ago

I will force scikit-learn = "0.24.2" in pyproject.toml for now

leoisl commented 1 year ago

https://github.com/iqbal-lab-org/make_prg/pull/42/commits/38b60e3fa4b4375d640f6205ff1bb2a5f8a0958f fixes this. @mbhall88 I hope you are ok with downgrading and fixing scikit-learn to 0.24.2. If not, please feel free to reopen.

mbhall88 commented 1 year ago

I'm not convinced this was the reason. I ran all the tests fine (although some failed). I seem to remember I had to use v1.0 of scikit-learn due to our required python version (or one of our other dependencies). Also, KMeans hasn't changed since 0.24.2 so I don't see why that would cause this issue. Ill dig into it today

mbhall88 commented 1 year ago

I can confirm that (in python v3.10 - I'll check 3.7 as well) ^1.0 of scitkit learn does not cause tests to hang

leoisl commented 1 year ago

It is a super weird issue that I can't debug. I don't think is related with KMeans module. make_prg from_msa runs fine, as well as make_prg update. The integration and other tests run fine (i.e. pass), as long as we can run them. It always gets stuck on the test___match_nonmatch_shortmatch test. If I remove it, it gets stuck on the next integration test. If I remove all integration tests, it runs just fine. I always get stuck on python core code that deals with multiprocessing and pools, not scikit code. After running some integration tests, for some reason python does not manage to figure out that one process finished execution and it should stop it and clean up. It keeps waiting for the process to finish forever. Here is my IDE debugger paused when I am stuck in one of the tests:

Whenever I pause I get the same call stack/frames. IDK what happens, if scikit-learn or a dependency somehow changes how python detects that a process finished in a multiprocessing pool... All I know is that when I downgrade scikit-learn to 0.24.2, everything works. I have no idea how to debug this, as the debugger does not show how this issue happens clearly. I've tried searching on the web and tried things some users said on stack overflow to no avail. I think can be solved, but is super tricky, might require lots of work. So I thought it would be easier simply downgrading

leoisl commented 1 year ago

I can confirm that (in python v3.10 - I'll check 3.7 as well) v1.0 of scitkit learn does not cause tests to hang

For me hangs with both. The specific scikit version poetry installs is 1.0.2

mbhall88 commented 1 year ago

Hmmm, seems python 3.7 hangs with scikit learn 1.0.2 for me but not when using 0.24.2

mbhall88 commented 1 year ago

After testing lots of different python and scikitlearn version in both mac and linux environments I have come to the conclusion that the best path forward for us is to drop support for python 3.7 (so min. will be 3.8) and pin scikit learn to ~1.0 (which is 1.0.x - 1.1 causes on of the tests to fail). In my testing I experience no hanging of tests with these python and scikit learn versions.

I've done this in 3bbfac7 but we can revert it if this is a big problem

leoisl commented 1 year ago

scikit-learn = "~1.0" makes tests hang on all ubuntu runners on github actions CI. scikit-learn = "0.24.2" works for ubuntu and mac os runners for python 3.8, 3.9, 3.10, 3.11. I this we should stick to scikit-learn = "0.24.2". For more details, please see https://github.com/iqbal-lab-org/make_prg/pull/42#issuecomment-1326538682

iqbal-lab-org / make_prg

scikit-learn = "^1.0" makes integration tests get stuck #44