neurophysik / jitcdde

Just-in-time compilation for delay differential equations
Other
56 stars 14 forks source link

Weird MacOS vs. Linux / Ubuntu behaviour #34

Closed jajcayn closed 3 years ago

jajcayn commented 3 years ago

Dear developer,

I encountered a weird behaviour of jitcdde depending on the host operating system. These are actually two issues, I have a hunch they might be connected, but maybe they are not.

jitcdde + numba import order

I am codeveloping a framework for whole-brain brain simulations neurolib and we use two backends - Euler forward scheme written in numba-jitted functions and jitcdde. I noticed a weird behaviour on MacOS. When I import jitcdde first and then numba I get weird assertion failed and my ipython kernel instantly dies. This is not a problem on Linux, and also not a problem on MacOS when I do the imports in the opposite order - first numba, then jitcdde. See screenshot:

Screenshot 2021-03-19 at 1 07 01 PM

On the left side I have a connection to the Ubuntu server - both orders work On the right side I have my computer (MacOS) - the first order gives assert failed at weird address (/Users/ci/miniconda3) - I do not have this address on my computer, I don't use miniconda

This is not a huge issue, as I can just make sure I first import numba everytime and then jitcdde, it is just a bit annoying

Segfault on Ubuntu

Not sure if these two are connected, but this time it is the other way around. Within the above-mentioned neurolib I've written a bunch of tests, where I integrate the brain model using both backends (jitcdde and numba based). I was working reasonably well for 5-6 months. Yesterday I pushed some changes in the numba backend (not related to jitcdde in any way), but one of my tests fail - related to jitcdde. I tried then to downgrade packages, nothing helps. Even weirder is -- the segfault is not happening on MacOS, only on Ubuntu.

steps to reproduce:

  1. have Ubuntu
  2. git clone git@github.com:neurolib-dev/neurolib.git
  3. cd neurolib
  4. pip install -r requirements.txt
  5. pip install pytest
  6. pip install .
  7. pytest tests/multimodel/test_aln.py

See screenshot:

Screenshot 2021-03-19 at 1 09 03 PM Screenshot 2021-03-19 at 1 09 10 PM

left side again: Ubuntu - segfaults on _jitcdde.py jump function (called when setting past of the state vector with constant and then adjust_diff()) right side: MacOS, tests run without any problems.. week or two ago, everything worked on Linux as well, something changed and I don't know what, but I am getting this behaviour on all Linux systems, and not MacOS.

Thanks a lot for looking into it! Best, Nikola

Wrzlprmft commented 3 years ago

Those are indeed weird errors.

jajcayn commented 3 years ago

Hey

  • I have no clue on the first error, except that I would guess that I am not responsible for it. Can you check whether this happens if you replace jitcdde with any of the modules it imports?

Yes, unfortunately, the same thing:

Screenshot 2021-03-20 at 12 30 28 PM

I mean, this is not really a problem, just a minor inconvenience, I typically use isort for my OCD, so it's just a matter of keeping numba imports first, nothing I cannot do...

  • I cannot reproduce the second error (on Ubuntu 20.10). Have you tried reproducing this on different Ubuntu machines? Anyway, the nasty thing with segfaults is that the system does not register them all the time, as they can change with memory fragmentation. So, when I run everything, there could still be a bad memory call, but I just don’t notice it because for me it happens within the memory allotted to the program. I can run this through Valgrind, but for this it would be really helpful to have a minimal example. Can you produce one?

so I noticed this the first on github actions for CI here - you can see that macos tests pass, linux test fail (ubuntu-latest). Then I have access to a virtual machine, on which I am running Ubuntu 20.04.2 LTS (the screenshots in my original post from Ubuntu are on this machine). On the virtual machine I tried python 3.7.9 and python 3.8.6 (I use pyenv) and the same behaviour - segfault on a single test. Apart from it, I tried couple of docker images:

# I tried different versions of ubuntu and debian
docker run -it ubuntu:20.10

inside docker:

apt update && apt -y upgrade
apt install python3 python3-dev python3-pip git
# this installs Python 3.8.6 / on debian it installs 3.7.3
git clone https://github.com/neurolib-dev/neurolib.git
cd neurolib
pip3 install -r requirements.txt
pip3 install matplotlib pytest
pip3 install .
# this test passes - uses the same integration with jitcdde
pytest tests/multimodel/test_fitzhugh_nagumo.py
# this is the one that gives segfault
pytest tests/multimodel/test_aln.py
Screenshot 2021-03-20 at 12 41 47 PM

I see same behaviour on all ubuntu versions I tried (20.04, 20.10, 18.04) and also on debian.

But as I was saying the really weird part is, that the code for this particular test hasn't changed in 2 months, and just yesterday I saw this error for the first time due to the github CI actions (I changed other code - unrelated to jitcdde) and all tests passed on my local machine (macos), but on github all linux tests failed due to this segfault.

As for the minimal example, I can try.. it's gonna be messy since the whole framework is kind of big with the common base classes that do the heavy lifting while integrating models. I'll try to write all the relevant code into one python file. Just a note - I have 5 models in the framework, only one exhibit these segfaults, and I guess it's not a coincidence that only this one uses python callbacks... others are completely symbolic in symengine. The one that fails uses python callbacks (since it's dynamics contain a lookup function) and the callbacks are wrapped with numba.njit for speed. Maybe something changed in numba that creates the segfaults during the initial jump with adjust_diff? I don't know, just spitballing here...

Anyway, thanks for getting back to me, I'll try to reconcile the MWE with this particular model and let you know. Thanks again! N.

Wrzlprmft commented 3 years ago

I have no clue on the first error, except that I would guess that I am not responsible for it. Can you check whether this happens if you replace jitcdde with any of the modules it imports?

Yes, unfortunately, the same thing: […]

That’s not what I meant: JiTCDDE imports some Python modules under the hood. I would guess that one of them is causing the problem, not JiTCDDE.

Does the following also cause an error?

import warnings, itertools, numpy, symengine, functools, sys, tempfile, os, inspect, setuptools, traceback, pickle, jinja2, importlib, bisect
import numba

As for the minimal example, I can try.. it's gonna be messy since the whole framework is kind of big with the common base classes that do the heavy lifting while integrating models. I'll try to write all the relevant code into one python file.

I acknowledge that it’s a nasty task, but it’s probably the only way forward, in particular since I

Just a note - I have 5 models in the framework, only one exhibit these segfaults, and I guess it's not a coincidence that only this one uses python callbacks... others are completely symbolic in symengine. The one that fails uses python callbacks (since it's dynamics contain a lookup function) and the callbacks are wrapped with numba.njit for speed. Maybe something changed in numba that creates the segfaults during the initial jump with adjust_diff? I don't know, just spitballing here...

For whatever it’s worth, I do have a test that combines callbacks and jumps, so that alone is probably not the problem. Maybe you can get to a minimal example more quickly if you create a simple script that uses Numba callbacks and jumps.

jajcayn commented 3 years ago

I have no clue on the first error, except that I would guess that I am not responsible for it. Can you check whether this happens if you replace jitcdde with any of the modules it imports?

Yes, unfortunately, the same thing: […]

That’s not what I meant: JiTCDDE imports some Python modules under the hood. I would guess that one of them is causing the problem, not JiTCDDE.

Does the following also cause an error?

import warnings, itertools, numpy, symengine, functools, sys, tempfile, os, inspect, setuptools, traceback, pickle, jinja2, importlib, bisect
import numba

oh, sorry, of course, makes much more sense:) anyway, symengine is the culprit... importing symengine first and then numba leads to assertion errors on MacOS but not on Linux... I'll ask around in the symengine repo, thanks:)

for the MWE, I'll work on it with focus on numba callbacks, Thanks!

jajcayn commented 3 years ago

Hey, so no need for MWE for my problems. The only thing you need to do is:

  1. take your example/sunflower_callback.py
  2. run it like it is - it should run and print my_sine called with arguments ...
  3. all good
  4. add import numba as the first line in the file
  5. add @numba.njit() over your my_sine_callback function so you got:
    import math
    @numba.njit()
    def my_sine_callback(y,arg):
    print(f"my_sine called with arguments {y} and {arg}")
    return math.sin(arg)
  6. run

for me:

Can you please try this? For now, MacOS gives address boundary error, Ubuntu just doesn't run but didn't give error

Thanks

Wrzlprmft commented 3 years ago

Can you please try this?

I did, and it does segfault for me.

However, this seems to originate from numbaing a print call. I also get an error by just calling my_sine_callback, as print is called within, although it’s not a segfault. Thus, I don’t think there is anything I can solve here, except for better handling errors of called-back functions, which I suspect means losing efficiency and should not be necessary since you should only callback well-tested functions.

Can you try calling your numbaed functions in the original example outside of JiTCDDE and see whether this throws a more informative error (hopefully not having anything to do with me)?

isuruf commented 3 years ago

anyway, symengine is the culprit... importing symengine first and then numba leads to assertion errors on MacOS but not on Linux...

Should be fixed with latest symengine wheels.

jajcayn commented 3 years ago

@isuruf perfect news! and I confirm it works:) Thanks a lot for letting me know

secondly, @Wrzlprmft, the new symengine seems to fix also my other problem! I tested on all Linux machines I have access to plus some docker VMs and: all my tests pass (see e.g. github actions> https://github.com/neurolib-dev/neurolib/actions/runs/665488678), no more segfaults

btw, just to answer your last question: the numba-jitted functions in my original example work like they should outside jitcdde. I know this because neurolib uses two backends - one based on jitcdde, second based on pure numba (i.e. a whole forward Euler integration scheme is written in numba). Tests based on pure numba integration works and worked without any problems, only the jitcdde-based backend threw segfaults... but now all seems good, I guess the new symengine was the strat.. Anyway, thanks a lot for your help, closing now.