indigo-astronomy / indigo

INDIGO is a system of standards and frameworks for multiplatform and distributed astronomy software development designed to scale with your needs.
http://www.indigo-astronomy.org
Other
148 stars 64 forks source link

Interrupted connection prevents attaching new clients to the indigo bus #380

Closed kkretzschmar closed 3 years ago

kkretzschmar commented 3 years ago

Hi, while testing the scenario of multiple client connections (see https://github.com/indigo-astronomy/indigo/issues/319), I ended up in an unresponsive server (see thread callstacks below).

This issue occurs if:

The reason is that the indigo_update_property holds the same mutex lock which the indigo_attach_client function is waiting for. So when the indigo_write does not unlock the mutex, no client can attach the bus.

My question: Is this strict locking here necessary? My naive assumption is that an indigo_update_property cannot change the definition of any Indigo property (only its value), and attaching a new client often comes with the enumeration of properties which is afaik an operation to get the definition of the properties supported by the devices.

Thanks, Klaus

(gdb) thread 11 [Switching to thread 11 (Thread 0x6fbff3d0 (LWP 14685))]

0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46

46 ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt

0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46

1 0x75ee8aea in __lll_lock_wait (futex=0x76d129b0 , private=0) at lowlevellock.c:46

2 0x75ee33ee in __GI___pthread_mutex_lock (mutex=0x76d129b0 ) at pthread_mutex_lock.c:113

3 0x75f2f740 in indigo_attach_client (client=client@entry=0x6f2005b8) at indigo_bus.c:374

4 0x75f107ac in start_worker_thread (client_socket=0x6fbdedb8) at indigo_server_tcp.c:95

5 0x75ee1614 in start_thread (arg=0x29a2323b) at pthread_create.c:463

6 0x75e7c7fc in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6

Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 10 [Switching to thread 10 (Thread 0x705ff3d0 (LWP 14273))]

0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47

47 in ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S (gdb) bt

0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47

1 0x75ee8cdc in __libc_write (fd=fd@entry=25, buf=buf@entry=0x705de610, nbytes=nbytes@entry=88) at ../sysdeps/unix/sysv/linux/write.c:27

2 0x75f1cec0 in indigo_write (handle=handle@entry=25,

buffer=buffer@entry=0x705de610 "<setNumberVector device='Mount Nexstar' name='MOUNT_HORIZONTAL_COORDINATES' state='Ok'>\n", length=length@entry=88)
at indigo_io.c:383

3 0x75f1cf34 in indigo_printf (handle=handle@entry=25, format=0x75f8af5c "<setNumberVector device='%s' name='%s' state='%s'%s>\n") at indigo_io.c:401

4 0x75f21c80 in xml_device_adapter_update_property (client=0x710005b8, property=0x17a5fd8, message=message@entry=0x0, device=0x177d680) at indigo_driver_xml.c:164

5 0x75f2232c in xml_device_adapter_update_property (client=client@entry=0x710005b8, device=device@entry=0x177d680, property=property@entry=0x17a5fd8,

message=message@entry=0x0) at indigo_driver_xml.c:250

6 0x75f300a4 in indigo_update_property (device=device@entry=0x177d680, property=0x17a5fd8, format=0x0) at indigo_bus.c:567

7 0x75f2cb60 in indigo_update_coordinates (device=device@entry=0x177d680, message=message@entry=0x0) at indigo_mount_driver.c:1146

8 0x74b7340c in position_timer_callback (device=0x177d680) at indigo_mount_nexstar.c:495

9 0x75f17364 in timer_func (timer=0x71039278) at indigo_timer.c:84

10 0x75ee1614 in start_thread (arg=0x29a2323b) at pthread_create.c:463

11 0x75e7c7fc in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6

rumengb commented 3 years ago

hi Klaus, IMHO strict locking is necessary for several reasons, main one is that each update and enumeration require writing to the bus, and it may happen when half of one update is sent then the another update starts so this will break the messages. I think there is no deadlock here, just a blocked thread, but probably we can introduce some read and write timeouts for such situations. We will discuss this with Peter.

By the way, I have just released a tool to make easier debugging such situations. it is called indigo_deadlock_detector, it is available trough our ppa: http://www.indigo-astronomy.org/indigo-sky.html - see the last section

also git repo for it is here: https://github.com/indigo-astronomy/indigo_deadlock_detector

can you please run it in this situation just to see what it will report? It should report clearly who is waiting on what and where...

Thanks!

Rumen

On Mon, Oct 26, 2020 at 7:32 PM kkretzschmar notifications@github.com wrote:

Hi, while testing the scenario of multiple client connections (see #319 https://github.com/indigo-astronomy/indigo/issues/319), I ended up in an unresponsive server (see thread callstacks below).

This issue occurs if:

  • at the time when a client wants to attach the bus (thread 11 - indigo_attach_client) there is a concurrent write operation to the bus (thread 10 - indigo_update_property)
  • the write operation is blocked (indigo_write) since the network connection was interrupted

The reason is that the indigo_update_property holds the same mutex lock which the indigo_attach_client function is waiting for. So when the indigo_write does not unlock the mutex, no client can attach the bus.

My question: Is this strict locking here necessary? My naive assumption is that an indigo_update_property cannot change the definition of any Indigo property (only its value), and attaching a new client often comes with the enumeration of properties which is afaik an operation to get the definition of the properties supported by the devices.

Thanks, Klaus

(gdb) thread 11 [Switching to thread 11 (Thread 0x6fbff3d0 (LWP 14685))]

0 __libc_do_syscall () at

../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46 46 ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt

0 __libc_do_syscall () at

../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46

1 https://github.com/indigo-astronomy/indigo/issues/1 0x75ee8aea in

__lll_lock_wait (futex=0x76d129b0 , private=0) at lowlevellock.c:46

2 https://github.com/indigo-astronomy/indigo/issues/2 0x75ee33ee in

__GI___pthread_mutex_lock (mutex=0x76d129b0 ) at pthread_mutex_lock.c:113

3 https://github.com/indigo-astronomy/indigo/pull/3 0x75f2f740 in

indigo_attach_client (client=client@entry=0x6f2005b8) at indigo_bus.c:374

4 https://github.com/indigo-astronomy/indigo/pull/4 0x75f107ac in

start_worker_thread (client_socket=0x6fbdedb8) at indigo_server_tcp.c:95

5 https://github.com/indigo-astronomy/indigo/pull/5 0x75ee1614 in

start_thread (arg=0x29a2323b) at pthread_create.c:463

6 https://github.com/indigo-astronomy/indigo/pull/6 0x75e7c7fc in ??

() at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) thread 10 [Switching to thread 10 (Thread 0x705ff3d0 (LWP 14273))]

0 __libc_do_syscall () at

../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 47 in ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S (gdb) bt

0 __libc_do_syscall () at

../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47

1 https://github.com/indigo-astronomy/indigo/issues/1 0x75ee8cdc in

__libc_write (fd=fd@entry=25, buf=buf@entry=0x705de610, nbytes=nbytes@entry=88) at ../sysdeps/unix/sysv/linux/write.c:27

2 https://github.com/indigo-astronomy/indigo/issues/2 0x75f1cec0 in

indigo_write (handle=handle@entry=25, buffer=buffer@entry=0x705de610 "\n", length=length@entry=88) at indigo_io.c:383

3 https://github.com/indigo-astronomy/indigo/pull/3 0x75f1cf34 in

indigo_printf (handle=handle@entry=25, format=0x75f8af5c "<setNumberVector device='%s' name='%s' state='%s'%s>\n") at indigo_io.c:401

4 https://github.com/indigo-astronomy/indigo/pull/4 0x75f21c80 in

xml_device_adapter_update_property (client=0x710005b8, property=0x17a5fd8, message=message@entry=0x0, device=0x177d680) at indigo_driver_xml.c:164

5 https://github.com/indigo-astronomy/indigo/pull/5 0x75f2232c in

xml_device_adapter_update_property (client=client@entry=0x710005b8, device=device@entry=0x177d680, property=property@entry=0x17a5fd8, message=message@entry=0x0) at indigo_driver_xml.c:250

6 https://github.com/indigo-astronomy/indigo/pull/6 0x75f300a4 in

indigo_update_property (device=device@entry=0x177d680, property=0x17a5fd8, format=0x0) at indigo_bus.c:567

7 https://github.com/indigo-astronomy/indigo/issues/7 0x75f2cb60 in

indigo_update_coordinates (device=device@entry=0x177d680, message=message@entry=0x0) at indigo_mount_driver.c:1146

8 https://github.com/indigo-astronomy/indigo/pull/8 0x74b7340c in

position_timer_callback (device=0x177d680) at indigo_mount_nexstar.c:495

9 https://github.com/indigo-astronomy/indigo/pull/9 0x75f17364 in

timer_func (timer=0x71039278) at indigo_timer.c:84

10 https://github.com/indigo-astronomy/indigo/pull/10 0x75ee1614 in

start_thread (arg=0x29a2323b) at pthread_create.c:463

11 https://github.com/indigo-astronomy/indigo/pull/11 0x75e7c7fc in ??

() at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBIZQ36P5MFF3XMG4ZDSMWXDFANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

Hi Rumen,

I think there is no deadlock here

Yes I agree, there is no deadlock in this situation.

it is called indigo_deadlock_detecto

Yes I've already found that tool, I'll use it when I think I am running into a deadlock. Good idea!

We will discuss this with Peter.

OK, thanks!

Klaus

rumengb commented 3 years ago

Can you please send me the output of it?

On Mon, Oct 26, 2020, 8:22 PM kkretzschmar notifications@github.com wrote:

Hi Rumen,

I think there is no deadlock here Yes I agree, there is no deadlock in this situation.

it is called indigo_deadlock_detecto Yes I've already found that tool, I'll use it when I think I am running into a deadlock. Good idea!

We will discuss this with Peter. OK, thanks!

Klaus

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-716737855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBL5UK473P4NW6H3HELSMW443ANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

I didn't use the deadlock_detector yet, when the indigo server wasn't responsive I attached with an gdb and created the stacktraces as posted above. Next time when I run into the problem I'll use the deadlock detector.

rumengb commented 3 years ago

Ok, it will not only report deadlocks it will report self locks, blocked threads and what are they waiting for. Maybe the name is not very good as it is not only deadlock detector...

On Mon, Oct 26, 2020, 11:02 PM kkretzschmar notifications@github.com wrote:

I didn't use the deadlock_detector yet, when the indigo server wasn't responsive I attached with an gdb and created the stacktraces as posted above. Next time when I run into the problem I'll use the deadlock detector.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-716821339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBPCV63L76T7NJTB6W3SMXPU5ANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

Sounds good, I'll try to reproduce the issue and use it !

rumengb commented 3 years ago

I have prototype. It is a scary change, should be tested extensively. Added 5s read and write timeout.. This should not block the server for more than 5 seconds in such situations... Will add it in separate branch until well tested...

On Mon, Oct 26, 2020 at 11:17 PM kkretzschmar notifications@github.com wrote:

Sounds good, I'll try to reproduce the issue and use it !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-716828703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBOHH2RPULX3WP2DIILSMXRM5ANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

I have created a new branch "socket_timeout" with socket read an and write timeouts. Unfortunately timeouts were not handled so I had to make changes in many places and I most likely I have missed some. So this needs extensive testing and fixing before it gets to the master branch.

Klaus, can you please confirm this approach works in your scenario? if not the whole point of this is not worth the effort.

rumengb commented 3 years ago

Peter please review my changes!

On Tue, Oct 27, 2020 at 10:54 AM Rumen Bogdanovski rumen@skyarchive.org wrote:

I have created a new branch "socket_timeout" with socket read an and write timeouts. Unfortunately timeouts were not handled so I had to make changes in many places and I most likely I have missed some. So this needs extensive testing and fixing before it gets to the master branch.

Klaus, can you please confirm this approach works in your scenario? if not the whole point of this is not worth the effort.

kkretzschmar commented 3 years ago

Great, I am going to test it immediately.

Thanks Rumen, Klaus

kkretzschmar commented 3 years ago

First tests look very promising. I can switch between different interfaces (LAN-WLAN-LAN) and reconnection works as expected. I'll do further testing.

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

jconejero commented 3 years ago

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

Nice! looking forward to it! ;)

On 27/10/2020 21:07, kkretzschmar wrote:

First tests look very promising. I can switch between different interfaces (LAN-WLAN-LAN) and reconnection works as expected. I'll do further testing.

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-717508045, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIWSBHSQQM37Z5GNLL3UJTSM4R6NANCNFSM4S7WCPOQ.

rumengb commented 3 years ago

Great :) I am glad it is promising. We discussed this approach with with Peter. I will polish it a bit tomorrow do some more tests and I will merge it in the master.

On Tue, Oct 27, 2020, 10:24 PM Juan Conejero notifications@github.com wrote:

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

Nice! looking forward to it! ;)

On 27/10/2020 21:07, kkretzschmar wrote:

First tests look very promising. I can switch between different interfaces (LAN-WLAN-LAN) and reconnection works as expected. I'll do further testing.

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-717508045>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACIWSBHSQQM37Z5GNLL3UJTSM4R6NANCNFSM4S7WCPOQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-717517070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBJS2EQVLZBKNB2GHGDSM4T7VANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

Can you try now! I removed the read timeout as it is not the problem. It waits on reads anyway. The trouble is with send/write. And it is handled correctly if it fails. And write time out is definitely a failure. So it is handled correctly. On the other hand read timeout is not a failure. This may be just the client being silent. So I changed the code in many places in order to handle silent clients correctly. But I am not sure I did it everywhere. So I want you to try it with write timeout and no read timeout.

On Tue, Oct 27, 2020, 11:50 PM Rumen Bogdanovski rumen@skyarchive.org wrote:

Great :) I am glad it is promising. We discussed this approach with with Peter. I will polish it a bit tomorrow do some more tests and I will merge it in the master.

On Tue, Oct 27, 2020, 10:24 PM Juan Conejero notifications@github.com wrote:

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

Nice! looking forward to it! ;)

On 27/10/2020 21:07, kkretzschmar wrote:

First tests look very promising. I can switch between different interfaces (LAN-WLAN-LAN) and reconnection works as expected. I'll do further testing.

Thanks a lot ! I'm going to release the PixInsight update with zeroconf support soon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-717508045>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACIWSBHSQQM37Z5GNLL3UJTSM4R6NANCNFSM4S7WCPOQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-717517070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBJS2EQVLZBKNB2GHGDSM4T7VANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

Hi Rumen, I tried and it looks still good. However I found a problem, which is not easily reproducible and I dont know if it has to do with your recent change ... but I used the deadlock detector to create a snapshot of the server status.

When the problem occurs, the server behaves as if it runs all the time in the write timeout. The client gets updates (e.g. mount lst time, coordinates) every 5 seconds, though the connection is alive. Below is the output of the deadlock detector. I added a corresponding snapshot for the good case at the end of this message.

Thanks!

================== Server with problems =======================================

Inspecting 'indigo_worker' (pid = 23940) for deadlocks [New LWP 23941] [New LWP 23942] [New LWP 23943] [New LWP 23944] [New LWP 23946] [New LWP 23948] [New LWP 23949] [New LWP 23953] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26 None

Thread 9 (Thread 0xffff9ffff140 (LWP 23953)):

0 __libc_read (nbytes=524288, buf=0xffff94000ce0, fd=-1610614720) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=fd@entry=21, buf=buf@entry=0xffff94000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffff94000ce0, __fd=21) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 indigo_xml_parse (device=device@entry=0x0, client=client@entry=0xffff94000b60) at indigo_xml.c:1277

4 0x0000ffffb020ed10 in start_worker_thread (client_socket=0xaaaafbdf05a0) at indigo_server_tcp.c:96

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 8 (Thread 0xffffac9e6140 (LWP 23949)):

0 __libc_write (nbytes=43, buf=0xffffac9c4f18, fd=-1399042280) at ../sysdeps/unix/sysv/linux/write.c:26

1 __libc_write (fd=fd@entry=21, buf=buf@entry=0xffffac9c4f18, nbytes=nbytes@entry=43) at ../sysdeps/unix/sysv/linux/write.c:24

2 0x0000ffffb020cb38 in indigo_write (handle=handle@entry=21, buffer=buffer@entry=0xffffac9c4f18 "20.4429\n", length=length@entry=43) at indigo_io.c:437

3 0x0000ffffb020cc4c in indigo_printf (handle=handle@entry=21, format=format@entry=0xffffb02a6700 "%s\n") at indigo_io.c:448

4 0x0000ffffb0215460 in xml_device_adapter_update_property (client=0xffff94000b60, property=0xaaaafbd76ba0, message=, device=) at indigo_driver_xml.c:170

5 0x0000ffffb0215b08 in xml_device_adapter_update_property (client=, device=, property=, message=) at indigo_driver_xml.c:147

6 0x0000ffffb022ef04 in indigo_update_property (device=device@entry=0xaaaafbd643e0, property=0xaaaafbd76ba0, format=format@entry=0x0) at indigo_bus.c:588

7 0x0000ffffb022bd44 in indigo_update_coordinates (device=0xaaaafbd643e0, message=0x0) at indigo_mount_driver.c:1209

8 0x0000ffffaec48624 in ?? ()

9 0x0000ffffa4000d68 in ?? ()

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 7 (Thread 0xffffad1e7140 (LWP 23948)):

0 futex_wait_cancelable (private=0, expected=0, futex_word=0xffffa0139b20) at ../sysdeps/nptl/futex-internal.h:183

1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xffffa0139b28, cond=0xffffa0139af8) at pthread_cond_wait.c:508

2 __pthread_cond_wait (cond=cond@entry=0xffffa0139af8, mutex=mutex@entry=0xffffa0139b28) at pthread_cond_wait.c:638

3 0x0000ffffb020fc7c in timer_func (timer=0xffffa0139ad0) at indigo_timer.c:123

4 0x0000ffffb01c84fc in start_thread (arg=0xffffad9c685f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 6 (Thread 0xffffad9e8140 (LWP 23946)):

0 __libc_read (nbytes=524288, buf=0xffffa0000ce0, fd=-1382119360) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=fd@entry=20, buf=buf@entry=0xffffa0000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffffa0000ce0, __fd=20) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 indigo_xml_parse (device=device@entry=0x0, client=client@entry=0xffffa0000b60) at indigo_xml.c:1277

4 0x0000ffffb020ed10 in start_worker_thread (client_socket=0xaaaafbdf0560) at indigo_server_tcp.c:96

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 5 (Thread 0xffffae1e9140 (LWP 23944)):

0 __libc_read (nbytes=1, buf=0xffffae1e886f, fd=) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=, buf=0xffffae1e886f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 0x0000ffffb1088f20 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 4 (Thread 0xffffae9ea140 (LWP 23943)):

0 __libc_read (nbytes=1, buf=0xffffae9e986f, fd=) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=, buf=0xffffae9e986f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 0x0000ffffb1088f20 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 3 (Thread 0xffffaf45c140 (LWP 23942)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffa8000b60, nfds=2, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 0x0000ffffaff45b54 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 0x0000ffffaff46d44 in libusb_handle_events_timeout_completed () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

3 0x0000ffffaff46d9c in libusb_handle_events () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

4 0x0000ffffb0206e50 in hotplug_thread (arg=) at indigo_driver.c:613

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2abf) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffffafc5d140 (LWP 23941)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffafc5c918, nfds=2, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 0x0000ffffaff4caa0 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e283f) at pthread_create.c:477

3 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffffb1102af0 (LWP 23940)):

0 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26

1 0x0000ffffb020d7f4 in indigo_server_start (callback=) at indigo_server_tcp.c:378

2 0x0000aaaacc1bf078 in server_main () at indigo_server.c:1152

3 0x0000aaaacc1bd2cc in main (argc=, argv=0xffffde3e3da8) at indigo_server.c:1294

Blocked threads:



[Inferior 1 (process 23940) detached]

============================ Server Ok ======================================== Inspecting 'indigo_worker' (pid = 23940) for deadlocks [New LWP 23941] [New LWP 23942] [New LWP 23943] [New LWP 23944] [New LWP 23946] [New LWP 23948] [New LWP 23949] [New LWP 24363] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26 None

Thread 9 (Thread 0xffff9f7fe140 (LWP 24363)):

0 __libc_read (nbytes=524288, buf=0xffff8c000ce0, fd=-1619007424) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=fd@entry=22, buf=buf@entry=0xffff8c000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffff8c000ce0, __fd=22) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 indigo_xml_parse (device=device@entry=0x0, client=client@entry=0xffff8c000b60) at indigo_xml.c:1277

4 0x0000ffffb020ed10 in start_worker_thread (client_socket=0xaaaafbd5f460) at indigo_server_tcp.c:96

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 8 (Thread 0xffffac9e6140 (LWP 23949)):

0 futex_abstimed_wait_cancelable (private=0, abstime=0xffffac9e5918, clockid=, expected=0, futex_word=0xffffa4000d90) at ../sysdeps/nptl/futex-internal.h:320

1 __pthread_cond_wait_common (abstime=0xffffac9e5918, clockid=, mutex=0xffffa4000d98, cond=0xffffa4000d68) at pthread_cond_wait.c:520

2 __pthread_cond_timedwait (cond=cond@entry=0xffffa4000d68, mutex=mutex@entry=0xffffa4000d98, abstime=abstime@entry=0xffffac9e5918) at pthread_cond_wait.c:656

3 0x0000ffffb020fd34 in timer_func (timer=0xffffa4000d40) at indigo_timer.c:74

4 0x0000ffffb01c84fc in start_thread (arg=0xffffad1e67ef) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 7 (Thread 0xffffad1e7140 (LWP 23948)):

0 futex_wait_cancelable (private=0, expected=0, futex_word=0xffffa0139b20) at ../sysdeps/nptl/futex-internal.h:183

1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xffffa0139b28, cond=0xffffa0139af8) at pthread_cond_wait.c:508

2 __pthread_cond_wait (cond=cond@entry=0xffffa0139af8, mutex=mutex@entry=0xffffa0139b28) at pthread_cond_wait.c:638

3 0x0000ffffb020fc7c in timer_func (timer=0xffffa0139ad0) at indigo_timer.c:123

4 0x0000ffffb01c84fc in start_thread (arg=0xffffad9c685f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 6 (Thread 0xffffad9e8140 (LWP 23946)):

0 __libc_read (nbytes=524288, buf=0xffffa0000ce0, fd=-1382119360) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=fd@entry=20, buf=buf@entry=0xffffa0000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffffa0000ce0, __fd=20) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 indigo_xml_parse (device=device@entry=0x0, client=client@entry=0xffffa0000b60) at indigo_xml.c:1277

4 0x0000ffffb020ed10 in start_worker_thread (client_socket=0xaaaafbdf0560) at indigo_server_tcp.c:96

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 5 (Thread 0xffffae1e9140 (LWP 23944)):

0 __libc_read (nbytes=1, buf=0xffffae1e886f, fd=) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=, buf=0xffffae1e886f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 0x0000ffffb1088f20 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 4 (Thread 0xffffae9ea140 (LWP 23943)):

0 __libc_read (nbytes=1, buf=0xffffae9e986f, fd=) at ../sysdeps/unix/sysv/linux/read.c:26

1 __libc_read (fd=, buf=0xffffae9e986f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 0x0000ffffb1088f20 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 3 (Thread 0xffffaf45c140 (LWP 23942)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffa8000b60, nfds=2, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 0x0000ffffaff45b54 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 0x0000ffffaff46d44 in libusb_handle_events_timeout_completed () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

3 0x0000ffffaff46d9c in libusb_handle_events () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

4 0x0000ffffb0206e50 in hotplug_thread (arg=) at indigo_driver.c:613

5 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e2abf) at pthread_create.c:477

6 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffffafc5d140 (LWP 23941)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffafc5c918, nfds=2, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 0x0000ffffaff4caa0 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 0x0000ffffb01c84fc in start_thread (arg=0xffffde3e283f) at pthread_create.c:477

3 0x0000ffffb0122f2c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffffb1102af0 (LWP 23940)):

0 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26

1 0x0000ffffb020d7f4 in indigo_server_start (callback=) at indigo_server_tcp.c:378

2 0x0000aaaacc1bf078 in server_main () at indigo_server.c:1152

3 0x0000aaaacc1bd2cc in main (argc=, argv=0xffffde3e3da8) at indigo_server.c:1294

Blocked threads:



[Inferior 1 (process 23940) detached]

rumengb commented 3 years ago

Can you do something for me? See indigo_server_tcp.c and change the timeout to 10s. And see if this happens with 10s interval. If so it is related. I am on the phone now but if you see the commit you will find the place easily :) Thanks!

On Wed, Oct 28, 2020, 8:34 PM kkretzschmar notifications@github.com wrote:

Hi Rumen, I tried and it looks still good. However I found a problem, which is not easily reproducible and I dont know if it has to do with your recent change ... but I used the deadlock detector to create a snapshot of the server status.

When the problem occurs, the server behaves as if it runs all the time in the write timeout. The client gets updates (e.g. mount lst time, coordinates) every 5 seconds, though the connection is alive. Below is the output of the deadlock detector. I added a corresponding snapshot for the good case at the end of this message.

Thanks!

================== Server with problems

Inspecting 'indigo_worker' (pid = 23940) for deadlocks [New LWP 23941] [New LWP 23942] [New LWP 23943] [New LWP 23944] [New LWP 23946] [New LWP 23948] [New LWP 23949] [New LWP 23953] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26 None

Thread 9 (Thread 0xffff9ffff140 (LWP 23953)):

0 __libc_read (nbytes=524288, buf=0xffff94000ce0, fd=-1610614720) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=fd@entry=21, buf=buf@entry=0xffff94000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffff94000ce0, __fd=21) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 https://github.com/indigo-astronomy/indigo/pull/3 indigo_xml_parse

(device=device@entry=0x0, client=client@entry=0xffff94000b60) at indigo_xml.c:1277

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb020ed10

in start_worker_thread (client_socket=0xaaaafbdf05a0) at indigo_server_tcp.c:96

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 8 (Thread 0xffffac9e6140 (LWP 23949)):

0 __libc_write (nbytes=43, buf=0xffffac9c4f18, fd=-1399042280) at

../sysdeps/unix/sysv/linux/write.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_write

(fd=fd@entry=21, buf=buf@entry=0xffffac9c4f18, nbytes=nbytes@entry=43) at ../sysdeps/unix/sysv/linux/write.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb020cb38 in indigo_write (handle=handle@entry=21, buffer=buffer@entry=0xffffac9c4f18 "20.4429\n", length=length@entry=43) at indigo_io.c:437

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb020cc4c

in indigo_printf (handle=handle@entry=21, format=format@entry=0xffffb02a6700 "%s\n") at indigo_io.c:448

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb0215460

in xml_device_adapter_update_property (client=0xffff94000b60, property=0xaaaafbd76ba0, message=, device=) at indigo_driver_xml.c:170

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0215b08

in xml_device_adapter_update_property (client=, device=, property=, message=) at indigo_driver_xml.c:147

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb022ef04

in indigo_update_property (device=device@entry=0xaaaafbd643e0, property=0xaaaafbd76ba0, format=format@entry=0x0) at indigo_bus.c:588

7 https://github.com/indigo-astronomy/indigo/issues/7

0x0000ffffb022bd44 in indigo_update_coordinates (device=0xaaaafbd643e0, message=0x0) at indigo_mount_driver.c:1209

8 https://github.com/indigo-astronomy/indigo/pull/8 0x0000ffffaec48624

in ?? ()

9 https://github.com/indigo-astronomy/indigo/pull/9 0x0000ffffa4000d68

in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 7 (Thread 0xffffad1e7140 (LWP 23948)):

0 futex_wait_cancelable (private=0, expected=0,

futex_word=0xffffa0139b20) at ../sysdeps/nptl/futex-internal.h:183

1 https://github.com/indigo-astronomy/indigo/issues/1

__pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xffffa0139b28, cond=0xffffa0139af8) at pthread_cond_wait.c:508

2 https://github.com/indigo-astronomy/indigo/issues/2

__pthread_cond_wait (cond=cond@entry=0xffffa0139af8, mutex=mutex@entry=0xffffa0139b28) at pthread_cond_wait.c:638

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb020fc7c

in timer_func (timer=0xffffa0139ad0) at indigo_timer.c:123

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffad9c685f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 6 (Thread 0xffffad9e8140 (LWP 23946)):

0 __libc_read (nbytes=524288, buf=0xffffa0000ce0, fd=-1382119360) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=fd@entry=20, buf=buf@entry=0xffffa0000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffffa0000ce0, __fd=20) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 https://github.com/indigo-astronomy/indigo/pull/3 indigo_xml_parse

(device=device@entry=0x0, client=client@entry=0xffffa0000b60) at indigo_xml.c:1277

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb020ed10

in start_worker_thread (client_socket=0xaaaafbdf0560) at indigo_server_tcp.c:96

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 5 (Thread 0xffffae1e9140 (LWP 23944)):

0 __libc_read (nbytes=1, buf=0xffffae1e886f, fd=) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=, buf=0xffffae1e886f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb1088f20

in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 4 (Thread 0xffffae9ea140 (LWP 23943)):

0 __libc_read (nbytes=1, buf=0xffffae9e986f, fd=) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=, buf=0xffffae9e986f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb1088f20

in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 3 (Thread 0xffffaf45c140 (LWP 23942)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffa8000b60, nfds=2,

timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffaff45b54 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffaff46d44 in libusb_handle_events_timeout_completed () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffaff46d9c

in libusb_handle_events () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb0206e50

in hotplug_thread (arg=) at indigo_driver.c:613

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2abf) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffffafc5d140 (LWP 23941)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffafc5c918, nfds=2,

timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffaff4caa0 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb01c84fc in start_thread (arg=0xffffde3e283f) at pthread_create.c:477

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffffb1102af0 (LWP 23940)):

0 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=...,

len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffb020d7f4 in indigo_server_start (callback=) at indigo_server_tcp.c:378

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000aaaacc1bf078 in server_main () at indigo_server.c:1152

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000aaaacc1bd2cc

in main (argc=, argv=0xffffde3e3da8) at indigo_server.c:1294

Blocked threads:


[Inferior 1 (process 23940) detached]

============================ Server Ok

Inspecting 'indigo_worker' (pid = 23940) for deadlocks [New LWP 23941] [New LWP 23942] [New LWP 23943] [New LWP 23944] [New LWP 23946] [New LWP 23948] [New LWP 23949] [New LWP 24363] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=..., len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26 None

Thread 9 (Thread 0xffff9f7fe140 (LWP 24363)):

0 __libc_read (nbytes=524288, buf=0xffff8c000ce0, fd=-1619007424) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=fd@entry=22, buf=buf@entry=0xffff8c000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffff8c000ce0, __fd=22) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 https://github.com/indigo-astronomy/indigo/pull/3 indigo_xml_parse

(device=device@entry=0x0, client=client@entry=0xffff8c000b60) at indigo_xml.c:1277

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb020ed10

in start_worker_thread (client_socket=0xaaaafbd5f460) at indigo_server_tcp.c:96

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 8 (Thread 0xffffac9e6140 (LWP 23949)):

0 futex_abstimed_wait_cancelable (private=0, abstime=0xffffac9e5918,

clockid=, expected=0, futex_word=0xffffa4000d90) at ../sysdeps/nptl/futex-internal.h:320

1 https://github.com/indigo-astronomy/indigo/issues/1

__pthread_cond_wait_common (abstime=0xffffac9e5918, clockid=, mutex=0xffffa4000d98, cond=0xffffa4000d68) at pthread_cond_wait.c:520

2 https://github.com/indigo-astronomy/indigo/issues/2

__pthread_cond_timedwait (cond=cond@entry=0xffffa4000d68, mutex=mutex@entry=0xffffa4000d98, abstime=abstime@entry=0xffffac9e5918) at pthread_cond_wait.c:656

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb020fd34

in timer_func (timer=0xffffa4000d40) at indigo_timer.c:74

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffad1e67ef) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 7 (Thread 0xffffad1e7140 (LWP 23948)):

0 futex_wait_cancelable (private=0, expected=0,

futex_word=0xffffa0139b20) at ../sysdeps/nptl/futex-internal.h:183

1 https://github.com/indigo-astronomy/indigo/issues/1

__pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xffffa0139b28, cond=0xffffa0139af8) at pthread_cond_wait.c:508

2 https://github.com/indigo-astronomy/indigo/issues/2

__pthread_cond_wait (cond=cond@entry=0xffffa0139af8, mutex=mutex@entry=0xffffa0139b28) at pthread_cond_wait.c:638

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb020fc7c

in timer_func (timer=0xffffa0139ad0) at indigo_timer.c:123

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffad9c685f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 6 (Thread 0xffffad9e8140 (LWP 23946)):

0 __libc_read (nbytes=524288, buf=0xffffa0000ce0, fd=-1382119360) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=fd@entry=20, buf=buf@entry=0xffffa0000ce0, nbytes=nbytes@entry=524288) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb0223eb8 in read (nbytes=524288, buf=0xffffa0000ce0, __fd=20) at /usr/include/aarch64-linux-gnu/bits/unistd.h:44

3 https://github.com/indigo-astronomy/indigo/pull/3 indigo_xml_parse

(device=device@entry=0x0, client=client@entry=0xffffa0000b60) at indigo_xml.c:1277

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb020ed10

in start_worker_thread (client_socket=0xaaaafbdf0560) at indigo_server_tcp.c:96

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2a2f) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 5 (Thread 0xffffae1e9140 (LWP 23944)):

0 __libc_read (nbytes=1, buf=0xffffae1e886f, fd=) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=, buf=0xffffae1e886f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb1088f20

in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 4 (Thread 0xffffae9ea140 (LWP 23943)):

0 __libc_read (nbytes=1, buf=0xffffae9e986f, fd=) at

../sysdeps/unix/sysv/linux/read.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_read

(fd=, buf=0xffffae9e986f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb1088d98 in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb1088f20

in ?? () from /lib/aarch64-linux-gnu/libdns_sd.so.1

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e286f) at pthread_create.c:477

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 3 (Thread 0xffffaf45c140 (LWP 23942)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffa8000b60, nfds=2,

timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffaff45b54 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffaff46d44 in libusb_handle_events_timeout_completed () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffaff46d9c

in libusb_handle_events () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

4 https://github.com/indigo-astronomy/indigo/pull/4 0x0000ffffb0206e50

in hotplug_thread (arg=) at indigo_driver.c:613

5 https://github.com/indigo-astronomy/indigo/pull/5 0x0000ffffb01c84fc

in start_thread (arg=0xffffde3e2abf) at pthread_create.c:477

6 https://github.com/indigo-astronomy/indigo/pull/6 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffffafc5d140 (LWP 23941)):

0 0x0000ffffb0119a9c in __GI___poll (fds=0xffffafc5c918, nfds=2,

timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffaff4caa0 in ?? () from /lib/aarch64-linux-gnu/libusb-1.0.so.0

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000ffffb01c84fc in start_thread (arg=0xffffde3e283f) at pthread_create.c:477

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb0122f2c

in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffffb1102af0 (LWP 23940)):

0 0x0000ffffb01d2f98 in __libc_accept (fd=, addr=addr@entry=...,

len=len@entry=0xffffde3e2b04) at ../sysdeps/unix/sysv/linux/accept.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1

0x0000ffffb020d7f4 in indigo_server_start (callback=) at indigo_server_tcp.c:378

2 https://github.com/indigo-astronomy/indigo/issues/2

0x0000aaaacc1bf078 in server_main () at indigo_server.c:1152

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000aaaacc1bd2cc

in main (argc=, argv=0xffffde3e3da8) at indigo_server.c:1294

Blocked threads:


[Inferior 1 (process 23940) detached]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-718129394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBMPFUNGMCJXM2IAMV3SNBP4JANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

I tried several times, but couldn't reproduce the observed issue above. When I observed the issue I checked the update time over several update cycles. The update time was quite excatly 5 seconds, so I thing it has to do with the new timeout.

rumengb commented 3 years ago

There are several 5s timeouts. So I do not know which one is it. Can you provide trace log of the Indigo server?

On Thu, Oct 29, 2020, 12:17 AM kkretzschmar notifications@github.com wrote:

I tried several times, but couldn't reproduce the observed issue above. When I observed the issue I checked the update time over several update cycles. The update time was quite excatly 5 seconds, so I thing it has to do with the new timeout.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-718240379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBIZYTHXCCOU6PAJGX3SNCJ6LANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

i looked at the backtrace Tread 8: Thread 8 (Thread 0xffffac9e6140 (LWP 23949)):

0 __libc_write (nbytes=43, buf=0xffffac9c4f18, fd=-1399042280) at

../sysdeps/unix/sysv/linux/write.c:26

1 https://github.com/indigo-astronomy/indigo/issues/1 __libc_write

(fd=fd@entry=21, buf=buf@entry=0xffffac9c4f18, nbytes=nbytes@entry=43) at ../sysdeps/unix/sysv/linux/write.c:24

2 https://github.com/indigo-astronomy/indigo/issues/2 0x0000ffffb020cb38

in indigo_write (handle=handle@entry=21, buffer=buffer@entry=0xffffac9c4f18 "20.4429\n", length=length@entry=43) at indigo_io.c:437

3 https://github.com/indigo-astronomy/indigo/pull/3 0x0000ffffb020cc4c

in indigo_printf (handle=handle@entry=21, format=format@entry=0xffffb02a6700 "%s\n") at indigo_io.c:448

what bothers me that in my socket_timeout branch at line 437 there is no write() I have: remains -= bytes_written;

write is at: 430: long bytes_written = write(handle, buffer, remains);

by the way, it is the samr on "master", are you using some modified version of indigo?

On Thu, Oct 29, 2020 at 12:22 AM Rumen Bogdanovski rumen@skyarchive.org wrote:

There are several 5s timeouts. So I do not know which one is it. Can you provide trace log of the Indigo server?

On Thu, Oct 29, 2020, 12:17 AM kkretzschmar notifications@github.com wrote:

I tried several times, but couldn't reproduce the observed issue above. When I observed the issue I checked the update time over several update cycles. The update time was quite excatly 5 seconds, so I thing it has to do with the new timeout.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-718240379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBIZYTHXCCOU6PAJGX3SNCJ6LANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

This is indeed strange ... I didn't change the sources and see in the indigo_io.c "remains -= bytes_written" at line 437 as well. What I did:

Possible causes:

------- git log --------------

Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Wed Oct 28 12:31:30 2020 +0200

remove recv/read timout

commit 4a1fd99f8a1335b5f347189254294749937ac0bf Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Tue Oct 27 10:10:32 2020 +0200

add socket timeout

commit 26c5b0d8ad777bd53aa594edf16c1dfad99f1073 Merge: 38573c1f 7ee01c33 Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:43:03 2020 +0100

Merge branch 'master' of https://github.com/indigo-astronomy/indigo

commit 38573c1f8522e543a411acefd581c0596f0d378f Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:59 2020 +0100

dome_baader/dome_nexdome: missing includes added

commit 5f6a6fd3430fbb1a5f2f2a2f64fb9ddaa104ffbf Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:32 2020 +0100

ccd_simulator: CCD_IMAGE_PROPERTY state fixed

commit 1f6352a216410f76769af576a403ddedfb6ac837 Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:41:59 2020 +0100

rumengb commented 3 years ago

Would you please remove the apt installed version. And try to reproduce with 10s timeout.

Also can you enable trace debug level this may help pinpoint the problem. This will produce a lot of output so redirect it to a file please.

To be honest I do not really understand the problem. Can you explain it a bit more? How does it affect the operation?

On Thu, Oct 29, 2020, 10:27 PM kkretzschmar notifications@github.com wrote:

This is indeed strange ... I didn't change the sources and see in the indigo_io.c "remains -= bytes_written" at line 437 as well. What I did:

  • checked out socket_timeout branch and built incrementally with master built artifacts (see git log below)
  • started build/bin/indigo_server build/drivers/indigo_mount_simulator

Possible causes:

  • I have an indigo installation which I have installed with apt -get from last Saturday
  • Is it possible that the server loads some shared libraries from that installation. I checked with strace but didnt find anything suspicious
  • Maybe the deadlock_detector gets its backlrace from the wrong library? The deadlock_detector is definitely from the Saturday installation

------- git log --------------

Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Wed Oct 28 12:31:30 2020 +0200

remove recv/read timout

commit 4a1fd99 https://github.com/indigo-astronomy/indigo/commit/4a1fd99f8a1335b5f347189254294749937ac0bf Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Tue Oct 27 10:10:32 2020 +0200

add socket timeout

commit 26c5b0d https://github.com/indigo-astronomy/indigo/commit/26c5b0d8ad777bd53aa594edf16c1dfad99f1073 Merge: 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f 7ee01c3 https://github.com/indigo-astronomy/indigo/commit/7ee01c3311ff93cd36912522cac80dbbc93d003c Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:43:03 2020 +0100

Merge branch 'master' of https://github.com/indigo-astronomy/indigo

commit 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:59 2020 +0100

dome_baader/dome_nexdome: missing includes added

commit 5f6a6fd https://github.com/indigo-astronomy/indigo/commit/5f6a6fd3430fbb1a5f2f2a2f64fb9ddaa104ffbf Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:32 2020 +0100

ccd_simulator: CCD_IMAGE_PROPERTY state fixed

commit 1f6352a https://github.com/indigo-astronomy/indigo/commit/1f6352a216410f76769af576a403ddedfb6ac837 Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:41:59 2020 +0100

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719001934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBJD4J7ANBHLRBGEZOLSNHF23ANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

Or even better try 3 seconds timeout. It may nappen more often.

On Thu, Oct 29, 2020, 11:08 PM Rumen Bogdanovski rumen@skyarchive.org wrote:

Would you please remove the apt installed version. And try to reproduce with 10s timeout.

Also can you enable trace debug level this may help pinpoint the problem. This will produce a lot of output so redirect it to a file please.

To be honest I do not really understand the problem. Can you explain it a bit more? How does it affect the operation?

On Thu, Oct 29, 2020, 10:27 PM kkretzschmar notifications@github.com wrote:

This is indeed strange ... I didn't change the sources and see in the indigo_io.c "remains -= bytes_written" at line 437 as well. What I did:

  • checked out socket_timeout branch and built incrementally with master built artifacts (see git log below)
  • started build/bin/indigo_server build/drivers/indigo_mount_simulator

Possible causes:

  • I have an indigo installation which I have installed with apt -get from last Saturday
  • Is it possible that the server loads some shared libraries from that installation. I checked with strace but didnt find anything suspicious
  • Maybe the deadlock_detector gets its backlrace from the wrong library? The deadlock_detector is definitely from the Saturday installation

------- git log --------------

Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Wed Oct 28 12:31:30 2020 +0200

remove recv/read timout

commit 4a1fd99 https://github.com/indigo-astronomy/indigo/commit/4a1fd99f8a1335b5f347189254294749937ac0bf Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Tue Oct 27 10:10:32 2020 +0200

add socket timeout

commit 26c5b0d https://github.com/indigo-astronomy/indigo/commit/26c5b0d8ad777bd53aa594edf16c1dfad99f1073 Merge: 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f 7ee01c3 https://github.com/indigo-astronomy/indigo/commit/7ee01c3311ff93cd36912522cac80dbbc93d003c Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:43:03 2020 +0100

Merge branch 'master' of https://github.com/indigo-astronomy/indigo

commit 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:59 2020 +0100

dome_baader/dome_nexdome: missing includes added

commit 5f6a6fd https://github.com/indigo-astronomy/indigo/commit/5f6a6fd3430fbb1a5f2f2a2f64fb9ddaa104ffbf Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:32 2020 +0100

ccd_simulator: CCD_IMAGE_PROPERTY state fixed

commit 1f6352a https://github.com/indigo-astronomy/indigo/commit/1f6352a216410f76769af576a403ddedfb6ac837 Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:41:59 2020 +0100

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719001934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBJD4J7ANBHLRBGEZOLSNHF23ANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

Klaus, I have been trying to reproduce the issue you described but i was unable. I added error logging on write and read next time when it happens we will know for sure if it was the write timeout. please use the latest "socket_timeout". I will not merge it to master until I get all green.

Rumen

On Thu, Oct 29, 2020 at 11:16 PM Rumen Bogdanovski rumen@skyarchive.org wrote:

Or even better try 3 seconds timeout. It may nappen more often.

On Thu, Oct 29, 2020, 11:08 PM Rumen Bogdanovski rumen@skyarchive.org wrote:

Would you please remove the apt installed version. And try to reproduce with 10s timeout.

Also can you enable trace debug level this may help pinpoint the problem. This will produce a lot of output so redirect it to a file please.

To be honest I do not really understand the problem. Can you explain it a bit more? How does it affect the operation?

On Thu, Oct 29, 2020, 10:27 PM kkretzschmar notifications@github.com wrote:

This is indeed strange ... I didn't change the sources and see in the indigo_io.c "remains -= bytes_written" at line 437 as well. What I did:

  • checked out socket_timeout branch and built incrementally with master built artifacts (see git log below)
  • started build/bin/indigo_server build/drivers/indigo_mount_simulator

Possible causes:

  • I have an indigo installation which I have installed with apt -get from last Saturday
  • Is it possible that the server loads some shared libraries from that installation. I checked with strace but didnt find anything suspicious
  • Maybe the deadlock_detector gets its backlrace from the wrong library? The deadlock_detector is definitely from the Saturday installation

------- git log --------------

Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Wed Oct 28 12:31:30 2020 +0200

remove recv/read timout

commit 4a1fd99 https://github.com/indigo-astronomy/indigo/commit/4a1fd99f8a1335b5f347189254294749937ac0bf Author: Rumen G.Bogdanovski rumen@skyarchive.org Date: Tue Oct 27 10:10:32 2020 +0200

add socket timeout

commit 26c5b0d https://github.com/indigo-astronomy/indigo/commit/26c5b0d8ad777bd53aa594edf16c1dfad99f1073 Merge: 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f 7ee01c3 https://github.com/indigo-astronomy/indigo/commit/7ee01c3311ff93cd36912522cac80dbbc93d003c Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:43:03 2020 +0100

Merge branch 'master' of https://github.com/indigo-astronomy/indigo

commit 38573c1 https://github.com/indigo-astronomy/indigo/commit/38573c1f8522e543a411acefd581c0596f0d378f Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:59 2020 +0100

dome_baader/dome_nexdome: missing includes added

commit 5f6a6fd https://github.com/indigo-astronomy/indigo/commit/5f6a6fd3430fbb1a5f2f2a2f64fb9ddaa104ffbf Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:42:32 2020 +0100

ccd_simulator: CCD_IMAGE_PROPERTY state fixed

commit 1f6352a https://github.com/indigo-astronomy/indigo/commit/1f6352a216410f76769af576a403ddedfb6ac837 Author: Peter Polakovic peter.polakovic@gmail.com Date: Mon Oct 26 15:41:59 2020 +0100

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719001934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBJD4J7ANBHLRBGEZOLSNHF23ANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

Hi Rumen, I could reproduce the problem. Here is the trace output:

socket 20 - active WLAN connection socket 21 - interrupted LAN connection

Hope that helps.

Thanks, Klaus

8:01:50.371094 indigo_server: 'Mount Simulator'.'MOUNT_LST_TIME' NUMBER ro Ok 2.0 0 { 18:01:50.371179 indigo_server: 'TIME' = 19.7262 18:01:50.371246 indigo_server: } 18:01:50.371321 indigo_server: 20 ← 18:01:50.371516 indigo_server: 20 ← 19.7262 18:01:50.371623 indigo_server: 20 ← 18:01:50.371715 indigo_server: 21 ← 18:01:55.490826 indigo_server: indigo_write(): Resource temporarily unavailable 18:01:55.490988 indigo_server: 21 ← 19.7262 18:02:00.610825 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:00.610982 indigo_server: 21 ← 18:02:05.730832 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:05.730990 indigo_server: INDIGO Bus: property update 18:02:05.731069 indigo_server: 'Mount Simulator'.'MOUNT_EQUATORIAL_COORDINATES' NUMBER rw Ok 2.0 0 { 18:02:05.731154 indigo_server: 'RA' = 13.7262 18:02:05.731229 indigo_server: 'DEC' = 90 18:02:05.731295 indigo_server: } 18:02:05.731370 indigo_server: 20 ← 18:02:05.731566 indigo_server: 20 ← 13.7262 18:02:05.731673 indigo_server: 20 ← 90 18:02:05.731797 indigo_server: 20 ← 18:02:05.731926 indigo_server: 21 ← 18:02:10.850852 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:10.851029 indigo_server: 21 ← 13.7262 18:02:15.970856 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:15.971032 indigo_server: 21 ← 90 NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, 18:02:21.090818 indigo_server: indigo_write(): Resource temporarily unavailable

rumengb commented 3 years ago

Good, it is related, for some reason the connection is not terminated. And what about the normal case? Does it terminate the connection?

On Sat, Oct 31, 2020 at 7:10 PM kkretzschmar notifications@github.com wrote:

Hi Rumen, I could reproduce the problem. Here is the trace output:

socket 20 - active WLAN connection socket 21 - interrupted LAN connection

Hope that helps.

Thanks, Klaus

8:01:50.371094 indigo_server: 'Mount Simulator'.'MOUNT_LST_TIME' NUMBER ro Ok 2.0 0 { 18:01:50.371179 indigo_server: 'TIME' = 19.7262 18:01:50.371246 indigo_server: } 18:01:50.371321 indigo_server: 20 ← 18:01:50.371516 indigo_server: 20 ← 19.7262 18:01:50.371623 indigo_server: 20 ← 18:01:50.371715 indigo_server: 21 ← 18:01:55.490826 indigo_server: indigo_write(): Resource temporarily unavailable 18:01:55.490988 indigo_server: 21 ← 19.7262 18:02:00.610825 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:00.610982 indigo_server: 21 ← 18:02:05.730832 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:05.730990 indigo_server: INDIGO Bus: property update 18:02:05.731069 indigo_server: 'Mount Simulator'.'MOUNT_EQUATORIAL_COORDINATES' NUMBER rw Ok 2.0 0 { 18:02:05.731154 indigo_server: 'RA' = 13.7262 18:02:05.731229 indigo_server: 'DEC' = 90 18:02:05.731295 indigo_server: } 18:02:05.731370 indigo_server: 20 ← 18:02:05.731566 indigo_server: 20 ← 13.7262 18:02:05.731673 indigo_server: 20 ← 90 18:02:05.731797 indigo_server: 20 ← 18:02:05.731926 indigo_server: 21 ← 18:02:10.850852 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:10.851029 indigo_server: 21 ← 13.7262 18:02:15.970856 indigo_server: indigo_write(): Resource temporarily unavailable 18:02:15.971032 indigo_server: 21 ← 90 NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, NULL, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(24259, 18:02:21.090818 indigo_server: indigo_write(): Resource temporarily unavailable

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719961454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBKSBWDI3NXXLUHXBV3SNRAIFANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

When I have a single connection and I temporarily interrupt the connection (plug-out LAN cable), the the server keeps sending updates. What do you exactly mean with normal case?

kkretzschmar commented 3 years ago

I think I know what the problem is and I think it has to do with my client implementation. I'll check it next week.

rumengb commented 3 years ago

What surprises me is that you observe this behaviour from time to time. It should not always. On the server we just call send and we do not care if it succeeds. I thought we handle such cases but we do not :( So what you see is pretty normal. Unfortunately the fix seems a kind of scary at this point. I know how to fix it but it will take time and more testing... I will try to find a nice and elegant solution for that massive overlooking from our side... :(

On Sun, Nov 1, 2020, 12:00 AM kkretzschmar notifications@github.com wrote:

I think I know what the problem is and I think it has to do with my client implementation. I'll check it next week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719993609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBMLRLZ6RHLSEUDIYVLSNSCGNANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

Klaus, can you try the master branch. After the release yesterday we merged it and we added handling for write timeout...

Rumen

On Sun, Nov 1, 2020, 12:28 AM Rumen Bogdanovski rumen@skyarchive.org wrote:

What surprises me is that you observe this behaviour from time to time. It should not always. On the server we just call send and we do not care if it succeeds. I thought we handle such cases but we do not :( So what you see is pretty normal. Unfortunately the fix seems a kind of scary at this point. I know how to fix it but it will take time and more testing... I will try to find a nice and elegant solution for that massive overlooking from our side... :(

On Sun, Nov 1, 2020, 12:00 AM kkretzschmar notifications@github.com wrote:

I think I know what the problem is and I think it has to do with my client implementation. I'll check it next week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719993609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBMLRLZ6RHLSEUDIYVLSNSCGNANCNFSM4S7WCPOQ .

rumengb commented 3 years ago

Klaus, Can you please test and give us feedback?

On Mon, Nov 2, 2020, 8:48 PM Rumen Bogdanovski rumen@skyarchive.org wrote:

Klaus, can you try the master branch. After the release yesterday we merged it and we added handling for write timeout...

Rumen

On Sun, Nov 1, 2020, 12:28 AM Rumen Bogdanovski rumen@skyarchive.org wrote:

What surprises me is that you observe this behaviour from time to time. It should not always. On the server we just call send and we do not care if it succeeds. I thought we handle such cases but we do not :( So what you see is pretty normal. Unfortunately the fix seems a kind of scary at this point. I know how to fix it but it will take time and more testing... I will try to find a nice and elegant solution for that massive overlooking from our side... :(

On Sun, Nov 1, 2020, 12:00 AM kkretzschmar notifications@github.com wrote:

I think I know what the problem is and I think it has to do with my client implementation. I'll check it next week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/380#issuecomment-719993609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBMLRLZ6RHLSEUDIYVLSNSCGNANCNFSM4S7WCPOQ .

kkretzschmar commented 3 years ago

Hi Rumen, sorry for the late response. I tested and it works perfectly. No interruptions after change of network devices anymore. Thank you for fixing this, this saved my investment to support multiple network interfaces. I'll close this issue.

Thank you! Klaus

rumengb commented 3 years ago

Great I hope to see this feature in the Pixinsight INDIGO module soon :)