paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).
https://paris-saclay-cds.github.io/ramp-docs/
BSD 3-Clause "New" or "Revised" License
68 stars 42 forks source link

Travis fails with segfault #92

Closed aboucaud closed 6 years ago

aboucaud commented 7 years ago

For the past 4 days or so the builds have been consistently failing with a segfault

Installing collected packages: tensorflow-tensorboard, tensorflow
  Found existing installation: tensorflow-tensorboard 0.4.0rc2
    Can't uninstall 'tensorflow-tensorboard'. No files were found to uninstall.
  Found existing installation: tensorflow 1.4.0
    Can't uninstall 'tensorflow'. No files were found to uninstall.
Successfully installed tensorflow-1.3.0 tensorflow-tensorboard-0.1.8
/home/travis/.travis/job_stages: line 57:  3267 Segmentation fault      (core dumped) pytest -s -v --cov=rampwf rampwf
The command "pytest -s -v --cov=rampwf rampwf" exited with 139.
cache.2
store build cache

Apparently it has to do with the tensorflow installation but I dont understand the segfault here.

Could it be some conda vs. pip issue ? What were the recent changes made to the kits or the libs for such issue to arise ? @mehdidc @glemaitre @jorisvandenbossche @kegl

glemaitre commented 7 years ago
                                                                                  The new release of tensorflow have been released few days ago. So I could assume that it might be a mismatch of libraries when downgraded the tensorflow version. Can we force installing the previous tensorflow version to see if it is still segfaulting
aboucaud commented 7 years ago

Except the issue arises when tensorflow is reinstalled with a forced version https://travis-ci.org/paris-saclay-cds/ramp-workflow/jobs/296795840#L1097

glemaitre commented 7 years ago

@aboucaud You are right. The workflow producing the error is:

Somehow I cannot reproduce it when doing manually. It looks like if something was not updated during the first installed

glemaitre commented 7 years ago

One solution is to try to installed the last release without forcing 1.3.0 in MNIST

glemaitre commented 7 years ago

By the way why tensorflow is installed inside the .travis.yml and we don't rely solely on the install in the kit.

glemaitre commented 7 years ago

It seems also that we force a master install of rampwf inside the mars_crater. It looks a bit of a circular install. I would think that this is not a great idea.

glemaitre commented 7 years ago

Bottom line: we might benefit of using conda instead of pip to manage the conflict between the package version. I got up to this error: https://travis-ci.org/paris-saclay-cds/ramp-workflow/jobs/297192994#L1033

The restriction is imposed by either tensorflow or nbconvert which required to high version for our usage. It is still working but we should not have to bother with that probably.

@aboucaud @jorisvandenbossche what are you thought on that.