rdeane / MeqSilhouette

A synthetic data simulation package for the Event Horizon Telescope
GNU General Public License v2.0
1 stars 4 forks source link

meqserver segfault #9

Open mjanssen2308 opened 6 years ago

mjanssen2308 commented 6 years ago

dmesg log shows segfaults from meqsilhouette when creating measurement sets: [ 1555.083357] meqserver[8809]: segfault at 8 ip 00007fc027e7dd58 sp 00007ffcebd276a0 error 4 in libpython2.7.so.1.0[7fc027d59000+2f2000] [ 1575.305928] meqserver[9114]: segfault at 8 ip 00007efd133fcd58 sp 00007ffdd48be720 error 4 in libpython2.7.so.1.0[7efd132d8000+2f2000] [ 1595.457714] meqserver[9416]: segfault at 8 ip 00007f1384955d58 sp 00007ffe2d9e0810 error 4 in libpython2.7.so.1.0[7f1384831000+2f2000]

iniyannatarajan commented 6 years ago

Hi Michael, was there an update involved? Either to python or the OS? Or any particular change in meqsilhouette settings that might have triggered it? If you can forward your settings to me, I'll see if I can reproduce the error.

freekroelofs commented 6 years ago

Hi Iniyan,

I think this is related to the message

Running TDL job "_simulate_MS"

Job result: None

No more commands

Stopping the meqserver

1.4Gb meqserver(meqserver.py:288:stop_default_mqs): meqserver not exited yet, waiting another 10 seconds

I've been getting this every time I run a simulation (as noted in my previous email). For example, I also get it when I do run_meqsilhouette.py with the example eht230.json input file, so maybe you can try if you get the same when running that. Michael noticed the seemingly related segfault today.

Freek

On Thu, Aug 16, 2018 at 10:35 AM, Iniyan Natarajan (N. Iniyan) < notifications@github.com> wrote:

Hi Michael, was there an update involved? Either to python or the OS? Or any particular change in meqsilhouette settings that might have triggered it? If you can forward your settings to me, I'll see if I can reproduce the error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdeane/MeqSilhouette/issues/9#issuecomment-413469102, or mute the thread https://github.com/notifications/unsubscribe-auth/ARwfE1TkPfnucoAxpbfvyh8XVQKxylkhks5uRS7XgaJpZM4V_Y0e .

iniyannatarajan commented 6 years ago

Hi Freek,

The first four lines of the output indicate normal operation of meqtrees. Apparently, the 10 second time delay was added in python to ensure that meqserver had stopped running before quitting. This check is repeated every 10 seconds for 200 seconds before meqserver is killed forcibly.

I too get this message all the time, sometimes even twice. I hadn't noticed the segfault until now. I see that I've been getting it too, although this doesn't seem to affect the simulated visibilities (which I've been using for the fringe-fitting tests).

When I run meqtrees outside meqsilhouette, the 10 second delay (and the corresponding segfault) is always triggered whenever I run meqtrees in script mode but not when I use the GUI. I'll talk to Oleg regarding this, but in the meantime, please let me know if you find that this affects your simulated visibilities.

mjanssen2308 commented 6 years ago

Hi Iniyan,

Thanks for picking this up. I think that also for us, the segfault does not affect the visibilities. We thought it may be relevant for tracing down the 10s delay time.

iniyannatarajan commented 6 years ago

Hi Michael,

Glad to know that. Yes, it would be good to track this down. I'll update here if I find a cleaner way of exiting execution.

rdeane commented 6 years ago

Yes, I remember Freek raising this during his visit to SA last year. At that time I asked Oleg about it who said it was inserted to ensure a clean completion of jobs and since 10s is negligible for virtually all non-EHT, static source simulations, no one ever complained/noticed. Obviously this is not something we want for higher time-resolution input images, so we'll talk to Oleg this week about a potential fix. Further down the line, we may want to consider switching to MontBlanc for visibility prediction (as is done for MeerKAT by some) which will be much faster, however, that will require quite a bit of modification for MeqSilhouette.