indigo-astronomy / indigo

INDIGO is a system of standards and frameworks for multiplatform and distributed astronomy software development designed to scale with your needs.
http://www.indigo-astronomy.org
Other
148 stars 64 forks source link

Segmentation fault in CCD Simulator's detach sequence #377

Closed nj-vs-vh closed 3 years ago

nj-vs-vh commented 3 years ago

I'm writing a Python interface for INDIGO client (based on dynamic_driver_client from examples), but I've run into unusual issue.

I successfully run my module, attach CCD Simulator, start exposure and process it with Python callback function, but when I run cleanup code (wrapped into Python function, but it doesn't seem to affect anything) I get segfault error. In particular, INDIGO log goes

22:00:24.331188 Application: indigo_ccd_simulator: 'CCD Imager Simulator' detached
22:00:24.331316 Application: indigo_ccd_simulator: 'CCD Imager Simulator (wheel)' detached
22:00:24.331381 Application: indigo_ccd_simulator: 'CCD Imager Simulator (focuser)' detached
Segmentation fault (core dumped)

Weird thing is, sometimes only first two devices detach before error, most of the time three, sometimes four. When I examine core dump with gdb I get the following output:

(gdb) bt full
#0  0x00007f405fc7fd3e in ccd_connect_callback (device=0x234e740) at indigo_ccd_simulator.c:610
No locals.
#1  0x00007f406062b0b8 in timer_func (timer=0x2346b00) at indigo_timer.c:86
        device = <optimized out>
#2  0x00007f4061d34609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139914444261120, -223758802025695315, 140732923244958, 140732923244959, 140732923244960, 139914444259264, 
                172739811111876525, 172771933887543213}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#3  0x00007f4061e70293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

When I run gdb in VS Code (using python my_python_indigo_wrapper.py as an entry point), Segfault occurs on line 610 in indigo_ccd_simulator.c, here:

CONNECTION_PROPERTY->state = INDIGO_OK_STATE;

device->device_context which CONNECTION_PROPERTY resolves to, shows up as NULL


I'm not sure if this issue has something to do with my Python wrapper. On one hand, I can't see any way Python interpreter can influence deep inner INDIGO code. I've encountered Segfault issues due to GIL violation, but there's no Python objects being called from cleanup code! On the other hand, regular dynamic driver client works and my implementation does not!

Soon I will be able to test my code with real camera (ZWO) and will certainly report the results.


I'm running Ubuntu 20.04, Python 3.8, and will be happy to provide further details.

nj-vs-vh commented 3 years ago

I'm close to abandoning any hope to figure out what's happening by myself: I have just changed log level in my extension from INDIGO_LOG_INFO to INDIGO_LOG_TRACE and error disappeared!

nj-vs-vh commented 3 years ago

Curent output:

23:47:23.999559 Application: INDIGO Bus: property change request
23:47:23.999639 Application: 'CCD Imager Simulator'.'CONNECTION' SWITCH UNDEFINED Idle 2.0 0 UNDEFINED {
23:47:23.999668 Application:   'CONNECTED' = Off 
23:47:23.999687 Application:   'DISCONNECTED' = On 
23:47:23.999708 Application: }
23:47:23.999729 Application: INDIGO Bus: Change request - Device 'CCD Imager Simulator' token 0x0, Proprerty 'CONNECTION' token 0x0
23:47:23.999754 Application: INDIGO Bus: property update
23:47:23.999776 Application: 'CCD Imager Simulator'.'CONNECTION' SWITCH rw Busy 2.0 0 OneOfMany {
23:47:23.999795 Application:   'CONNECTED' = Off 
23:47:23.999813 Application:   'DISCONNECTED' = On 
23:47:23.999832 Application: }
23:47:23.999883 Application: indigo_ccd_simulator: 'CCD Imager Simulator' detached
23:47:23.999968 Application: timer #0 (of 2) used for 0s
23:47:24.000059 Application: timer #0 done
23:47:24.000129 Application: INDIGO Bus: property removal
23:47:24.000222 Application: 'CCD Imager Simulator'.'' TEXT ro Ok 2.0 0  {
23:47:24.000253 Application: }
23:47:24.000289 Application: indigo_ccd_simulator: 'CCD Imager Simulator (wheel)' detached
23:47:24.000342 Application: timer #1 done
23:47:24.000508 Application: INDIGO Bus: property removal
23:47:24.000593 Application: 'CCD Imager Simulator (wheel)'.'' TEXT ro Ok 2.0 0  {
23:47:24.000624 Application: }
23:47:24.000654 Application: indigo_ccd_simulator: 'CCD Imager Simulator (focuser)' detached
23:47:24.000698 Application: INDIGO Bus: property removal
23:47:24.000725 Application: 'CCD Imager Simulator (focuser)'.'' TEXT ro Ok 2.0 0  {
23:47:24.000766 Application: }
23:47:24.000793 Application: indigo_ccd_simulator: 'CCD Guider Simulator' detached
23:47:24.000825 Application: INDIGO Bus: property removal
23:47:24.000847 Application: 'CCD Guider Simulator'.'' TEXT ro Ok 2.0 0  {
23:47:24.000866 Application: }
23:47:24.000886 Application: indigo_ccd_simulator: 'CCD Guider Simulator (guider)' detached
23:47:24.000913 Application: INDIGO Bus: property removal
23:47:24.000935 Application: 'CCD Guider Simulator (guider)'.'' TEXT ro Ok 2.0 0  {
23:47:24.000953 Application: }
23:47:24.000974 Application: indigo_ccd_simulator: 'CCD Guider Simulator (AO)' detached
23:47:24.001001 Application: INDIGO Bus: property removal
23:47:24.001022 Application: 'CCD Guider Simulator (AO)'.'' TEXT ro Ok 2.0 0  {
23:47:24.001040 Application: }
23:47:24.001063 Application: indigo_ccd_simulator: 'DSLR Simulator' detached
23:47:24.001095 Application: INDIGO Bus: property removal
23:47:24.001116 Application: 'DSLR Simulator'.'' TEXT ro Ok 2.0 0  {
23:47:24.001134 Application: }
23:47:24.005057 Application: Driver indigo_ccd_simulator unloaded
23:47:24.005144 Application: detached from INDIGO bus...
rumengb commented 3 years ago

This seems like a race in disconnect... We will investigate. Thank you for reporting.

rumengb commented 3 years ago

@nj-vs-vh , were you taking exposure while unloading the driver?

nj-vs-vh commented 3 years ago

@rumengb no, I've checked the logs, they go

13:20:39.495804 Application: Client upload in 0.006184s
13:20:39.495814 Application: INDIGO Bus: property update
13:20:39.495821 Application: 'CCD Imager Simulator'.'CCD_EXPOSURE' NUMBER rw Ok 2.0 0  {
13:20:39.495825 Application:   'EXPOSURE' = 0 
13:20:39.495829 Application: }
13:20:39.495834 Application: timer callback: 0x7f37e7322650 finished
13:20:39.495838 Application: timer #0 done
file written!

Last line is printed from my Python code and right after it I call cleanup function that produces the same output as in the previous message.

rumengb commented 3 years ago

I did my best to reproduce it, but I was unable... I loaded and unloaded the driver 100 times no issue. I even connected and started exposure, moved focuser and much more, then unloaded the driver. again no crash. Can you share more details how to reproduce it? Now I see that it is your application crashing... Maybe there is something wrong in your code, maybe python to C bindings?

nj-vs-vh commented 3 years ago

@rumengb I think I have finally tracked down the problem, and it indeed has something to do with Python. Sometimes with no apparent reason C string stored inside Python extension module becomes corrupted, and it must be messing up INDIGO's execution. Right before the crash I get the following trace log:

23:38:24.788958 Application: INDIGO Bus: property removal
23:38:24.789005 Application: 'CCD Imager Simulator'.'' TEXT ro Ok 2.0 0  {
23:38:24.789026 Application: }
23:38:24.789062 Application: indigo_ccd_simulator: 'CCD Imager Simulator (wheel)' detached
23:38:24.789101 Application: INDIGO Bus: property removal
23:38:24.789147 Application: '�[k)�'.'CCD_INFO' NUMBER ro Ok 2.0 0  {
23:38:24.789167 Application: }
23:38:24.789206 Application: INDIGO Bus: property removal
Segmentation fault (core dumped)

I'm sorry I have wasted your time! When I figure it out I hope to provide a fully functional Python interface for your amazing framework though!

rumengb commented 3 years ago

Wow, Are you doing python bindings for indigo? Many people asked for this but we are too busy to do it. We have many things in our TODO list. Thank you!

On Mon, Oct 12, 2020 at 6:47 PM Igor Vaiman notifications@github.com wrote:

I think I have finally tracked down the problem, and it indeed has something to do with Python. Sometimes with no apparent reason C string stored inside Python extension module becomes corrupted, and it must be messing up INDIGO's execution. Right before the crash I get the following trace log:

23:38:24.788958 Application: INDIGO Bus: property removal

23:38:24.789005 Application: 'CCD Imager Simulator'.'' TEXT ro Ok 2.0 0 {

23:38:24.789026 Application: }

23:38:24.789062 Application: indigo_ccd_simulator: 'CCD Imager Simulator (wheel)' detached

23:38:24.789101 Application: INDIGO Bus: property removal

23:38:24.789147 Application: '�[k)�'.'CCD_INFO' NUMBER ro Ok 2.0 0 {

23:38:24.789167 Application: }

23:38:24.789206 Application: INDIGO Bus: property removal

Segmentation fault (core dumped)

I'm sorry I have wasted your time! When I figure it out I hope to provide a fully functional Python interface for your amazing framework though!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/377#issuecomment-707199465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBOK4E5HL5OEJFDYAOTSKMQJFANCNFSM4SKEIZ4Q .

nj-vs-vh commented 3 years ago

@rumengb I'm not sure what you mean by 'bindings', but my current goal is an Indigo client wrapped into a Python module. It's intended use is to communicate with Indigo devices connected to the same machine from Python code. I'm currently trying to satisfy my own needs with this project, but I will definitely check your TODO list later!

rumengb commented 3 years ago

I know where the issue comes from. Please send me an email to rumen@skyarchive.org to discuss this. I will update the documentation to address this pitfall... There is a delete request that deletes all properties which is issued when device is attached.

rumengb commented 3 years ago

Reopening: There is a race between indigo_device_disconnect() and indigo_remove_driver(). A workaround is small a delay between them, but this is not a solution...

nj-vs-vh commented 3 years ago

@rumengb Thank you very much! I'll add this as a temporary fix.

polakovic commented 3 years ago

The bug in INDIGO is fixed in this commit d864605a315fd4242d61117311a6db15a498c727, but there is still an issue in your code. You should always test the result of indigo_remove_driver() and use something like this:

while (indigo_remove_driver(driver) != INDIGO_OK) indigo_usleep(ONE_SECOND_DELAY);

So if it fails (in this case because the asynchronous "disconnect" operation is still in progress), you have to repeat later.

rumengb commented 3 years ago

No it is not fixed :(. It still happens there is a race between:

CONNECTION_PROPERTY->state = INDIGO_OK_STATE; and indigo_ccd_change_property(device, NULL, CONNECTION_PROPERTY);

after CONNECTION_PROPERTY->state = INDIGO_OK_STATE here the device is assumed disconnected and if driver shutdown kicks in before indigo_ccd_change_property(device, NULL, CONNECTION_PROPERTY);

it fails...

rumengb commented 3 years ago

it is fixed now but your code should be changed with something like:

indigo_device_disconnect(&client, CCD_SIMULATOR); while (INDIGO_OK != indigo_remove_driver(driver)) { indigo_usleep(10000); } indigo_detach_client(&client); indigo_stop();

as disconnect is asynchronous and indigo_remove_driver() fails with INDIGO_BUSY if some device is connected.