indigo-astronomy / indigo

INDIGO is a system of standards and frameworks for multiplatform and distributed astronomy software development designed to scale with your needs.
http://www.indigo-astronomy.org
Other
149 stars 67 forks source link

Alpaca agent cause indigo webpage crash #531

Closed Paolo97Gll closed 3 months ago

Paolo97Gll commented 5 months ago

I found during a recent update that the main indigo webpage crashes in an infinite loop. Everything else just works (and continues to work), except for the webpage.

image

By running some tests, I found that this happens when using the agent_alpaca driver.

I used the following command (here is the full log of a test session: indigo.log)

/usr/bin/indigo_server \
    --enable-trace \
    --disable-bonjour \
    indigo_ccd_asi \
    indigo_ccd_atik \
    indigo_wheel_sx \
    indigo_agent_alpaca

and the server webpage starts crashing. But if I remove the last line - aka the indigo_agent_alpaca - and run the command again, the main webpage loads as expected.

This happens using both a Raspberry Pi 4 or a x86 server, both with Ubuntu 22.04 server OS and with both indigo 2.0-278 and indigo 2.0-280.

rumengb commented 4 months ago

Hi, Are you running unmodified version of INDIGO? I looked at the log but I do not see ASI120 attached anywhere but the Alpaca agent is exporting it. this are the devices attached:

11:57:34.204379 indigo_server: B <- Attach device 'SX Filter Wheel #010104'
11:57:34.285132 indigo_server: B <- Attach device 'Alpaca Agent'
11:57:34.736082 indigo_server: B <- Attach device 'Atik 16200'
11:57:34.969525 indigo_server: B <- Attach device 'Server'

and yet the alpaca agent sees ASI120 along with some garbage devices:

11:57:34.350962 indigo_server: B <+ Define 'Alpaca Agent'.'AGENT_ALPACA_DEVICES' TEXT rw Ok 2.0 0  { // Device mapping
11:57:34.351038 indigo_server: B <+   '0' = 'Atik 16200' // Device #0
11:57:34.351113 indigo_server: B <+   '1' = 'SX Filter Wheel #010104' // Device #1
11:57:34.351188 indigo_server: B <+   '2' = 'ZWO ASI120MM-S' // Device #2
11:57:34.351262 indigo_server: B <+   '3' = 'ZWO ASI120MM-S (guider)' // Device #3
11:57:34.351336 indigo_server: B <+   '4' = '�l>��' // Device #4
11:57:34.351410 indigo_server: B <+   '5' = '�l>�� (guider)' // Device #5
11:57:34.351482 indigo_server: B <- }

This may be a bug in Alpaca agent, but I do not he how it would make up ASI120 if the asi driver does not export it... @polakovic any ideas?

Paolo97Gll commented 3 months ago

Hi, I’m sorry if I reply only now, it’s been a hard month at work.

Yes, I run a vanilla server version (downloaded with apt) and the problem persist with INDIGO 2.0-286. I have a bunch of custom drivers, but as you can see those drivers were not loaded at the time I took the test above.

I think that Alpaca export also an an ASI120 because normally I use also a remote server that mount an ASI (that is exported by Alpaca). When I ran the test, the remote server was disconnected to simplify debugging. I can do a full test, if you want, also with the remote server. About the garbage device, I do not have any idea… maybe cleaning the alpaca agent config can help?

rumengb commented 3 months ago

I can not reproduce the issue. Maybe it is related to some of the drivers in combination with devices in use. At this point I do not know. We can probably see the cause if you repeat the test with valgrind (to check where exactly it crashes) and '--' switch (to prevent indigo_server from starting again after the crash). Maybe it is a good idea to turn on debug logging with '-vv' :

valgrind /usr/bin/indigo_server \
    -vv \
    -- \
    --enable-trace \
    --disable-bonjour \
    indigo_ccd_asi \
    indigo_ccd_atik \
    indigo_wheel_sx \
    indigo_agent_alpaca > indigo_valgrind.log 2>&1

Please note that everything executed through valgrind will be very slow, and some operations may take a a lot longer than usual. Be patient please!

Paolo97Gll commented 3 months ago

Today I had time to look deeply into the problem. I was curious about the strange garbage devices you found and mentioned in the previous comment. These are the Alpaca config files in the master server:

image

and in the remote server:

image

I don't know how they appeared in these files, it seems like an encoding error or something like that. Using a hex dump of the files and trying lots of different encodings gives no results. I think it's an error caused by something (en encoding or write error, an USB or driver error, ...). Furthermore, there is also incoherence between the master devices and the corresponding devices in the remote (two are missing).

However, deleting by hand the garbage entries from the config file solved the problem! I think that the file parser was crashing somewhere.

This is the hex dump of the master file, maybe it can be useful to find the root of the crash in INDIGO and make it more resilient:

00000000  3c 6e 65 77 54 65 78 74  56 65 63 74 6f 72 20 64  |<newTextVector d|
00000010  65 76 69 63 65 3d 27 41  6c 70 61 63 61 20 41 67  |evice='Alpaca Ag|
00000020  65 6e 74 27 20 6e 61 6d  65 3d 27 41 47 45 4e 54  |ent' name='AGENT|
00000030  5f 41 4c 50 41 43 41 5f  44 45 56 49 43 45 53 27  |_ALPACA_DEVICES'|
00000040  3e 0a 3c 6f 6e 65 54 65  78 74 20 6e 61 6d 65 3d  |>.<oneText name=|
00000050  27 30 27 3e 50 72 69 6d  61 4c 75 63 65 4c 61 62  |'0'>PrimaLuceLab|
00000060  20 45 53 41 54 54 4f 20  46 6f 63 75 73 65 72 3c  | ESATTO Focuser<|
00000070  2f 6f 6e 65 54 65 78 74  3e 0a 3c 6f 6e 65 54 65  |/oneText>.<oneTe|
00000080  78 74 20 6e 61 6d 65 3d  27 31 27 3e 4d 6f 75 6e  |xt name='1'>Moun|
00000090  74 20 53 79 6e 53 63 61  6e 3c 2f 6f 6e 65 54 65  |t SynScan</oneTe|
000000a0  78 74 3e 0a 3c 6f 6e 65  54 65 78 74 20 6e 61 6d  |xt>.<oneText nam|
000000b0  65 3d 27 32 27 3e 4d 6f  75 6e 74 20 53 79 6e 53  |e='2'>Mount SynS|
000000c0  63 61 6e 20 28 67 75 69  64 65 72 29 3c 2f 6f 6e  |can (guider)</on|
000000d0  65 54 65 78 74 3e 0a 3c  6f 6e 65 54 65 78 74 20  |eText>.<oneText |
000000e0  6e 61 6d 65 3d 27 33 27  3e 41 74 69 6b 20 31 36  |name='3'>Atik 16|
000000f0  32 30 30 20 40 20 73 6f  64 6f 6d 61 3c 2f 6f 6e  |200 @ sodoma</on|
00000100  65 54 65 78 74 3e 0a 3c  6f 6e 65 54 65 78 74 20  |eText>.<oneText |
00000110  6e 61 6d 65 3d 27 34 27  3e 53 58 20 46 69 6c 74  |name='4'>SX Filt|
00000120  65 72 20 57 68 65 65 6c  20 23 30 31 30 31 30 34  |er Wheel #010104|
00000130  20 40 20 73 6f 64 6f 6d  61 3c 2f 6f 6e 65 54 65  | @ sodoma</oneTe|
00000140  78 74 3e 0a 3c 6f 6e 65  54 65 78 74 20 6e 61 6d  |xt>.<oneText nam|
00000150  65 3d 27 35 27 3e 5a 57  4f 20 41 53 49 31 32 30  |e='5'>ZWO ASI120|
00000160  4d 4d 2d 53 20 40 20 73  6f 64 6f 6d 61 3c 2f 6f  |MM-S @ sodoma</o|
00000170  6e 65 54 65 78 74 3e 0a  3c 6f 6e 65 54 65 78 74  |neText>.<oneText|
00000180  20 6e 61 6d 65 3d 27 36  27 3e 5a 57 4f 20 41 53  | name='6'>ZWO AS|
00000190  49 31 32 30 4d 4d 2d 53  20 28 67 75 69 64 65 72  |I120MM-S (guider|
000001a0  29 20 40 20 73 6f 64 6f  6d 61 3c 2f 6f 6e 65 54  |) @ sodoma</oneT|
000001b0  65 78 74 3e 0a 3c 6f 6e  65 54 65 78 74 20 6e 61  |ext>.<oneText na|
000001c0  6d 65 3d 27 37 27 3e 98  6c 26 67 74 3b 07 ab aa  |me='7'>.l&gt;...|
000001d0  20 40 20 73 6f 64 6f 6d  61 3c 2f 6f 6e 65 54 65  | @ sodoma</oneTe|
000001e0  78 74 3e 0a 3c 6f 6e 65  54 65 78 74 20 6e 61 6d  |xt>.<oneText nam|
000001f0  65 3d 27 38 27 3e 98 6c  26 67 74 3b 07 ab aa 20  |e='8'>.l&gt;... |
00000200  28 67 75 69 64 65 72 29  20 40 20 73 6f 64 6f 6d  |(guider) @ sodom|
00000210  61 3c 2f 6f 6e 65 54 65  78 74 3e 0a 3c 6f 6e 65  |a</oneText>.<one|
00000220  54 65 78 74 20 6e 61 6d  65 3d 27 39 27 3e 98 7c  |Text name='9'>.||
00000230  e2 dc aa aa 20 40 20 73  6f 64 6f 6d 61 3c 2f 6f  |.... @ sodoma</o|
00000240  6e 65 54 65 78 74 3e 0a  3c 6f 6e 65 54 65 78 74  |neText>.<oneText|
00000250  20 6e 61 6d 65 3d 27 31  30 27 3e 98 7c e2 dc aa  | name='10'>.|...|
00000260  aa 20 28 67 75 69 64 65  72 29 20 40 20 73 6f 64  |. (guider) @ sod|
00000270  6f 6d 61 3c 2f 6f 6e 65  54 65 78 74 3e 0a 3c 2f  |oma</oneText>.</|
00000280  6e 65 77 54 65 78 74 56  65 63 74 6f 72 3e 0a     |newTextVector>.|
0000028f

Do you want me to test also with Valgrind and the broken file?

rumengb commented 3 months ago

if you still have the broken files can you send them to me?

On Mon, Jun 10, 2024, 9:25 PM Paolo Galli @.***> wrote:

Today I had time to look deeply into the problem. I was curious about the strange garbage devices you found and mentioned in the previous comment. These are the Alpaca config files in the master server:

image.png (view on web) https://github.com/indigo-astronomy/indigo/assets/49845775/fce2c904-4ae1-4650-9293-5090260d471d

and in the remote server:

image.png (view on web) https://github.com/indigo-astronomy/indigo/assets/49845775/1c8a516e-ddc6-4e3c-b4e6-60b16f363feb

I don't know how they appeared in these files: I never edited them by hand and I have no idea how INDIGO wrote them. It seems like an encoding error or something like that. Using a hex dump of the files and trying lots of different encodings gives no results. I think it's a write error caused by something. Furthermore, there is also incoherence between the remote devices and the corresponding devices exported in the master (two are missing).

However, deleting by hand the garbage entries from the config file solved the problem! I think that the file parser was crashing somewhere.

This is the hex dump of the master file, maybe it can be useful to find the root of the problem:

00000000 3c 6e 65 77 54 65 78 74 56 65 63 74 6f 72 20 64 |<newTextVector d| 00000010 65 76 69 63 65 3d 27 41 6c 70 61 63 61 20 41 67 |evice='Alpaca Ag| 00000020 65 6e 74 27 20 6e 61 6d 65 3d 27 41 47 45 4e 54 |ent' name='AGENT| 00000030 5f 41 4c 50 41 43 41 5f 44 45 56 49 43 45 53 27 |_ALPACA_DEVICES'| 00000040 3e 0a 3c 6f 6e 65 54 65 78 74 20 6e 61 6d 65 3d |>.<oneText name=| 00000050 27 30 27 3e 50 72 69 6d 61 4c 75 63 65 4c 61 62 |'0'>PrimaLuceLab| 00000060 20 45 53 41 54 54 4f 20 46 6f 63 75 73 65 72 3c | ESATTO Focuser<| 00000070 2f 6f 6e 65 54 65 78 74 3e 0a 3c 6f 6e 65 54 65 |/oneText>.<oneTe| 00000080 78 74 20 6e 61 6d 65 3d 27 31 27 3e 4d 6f 75 6e |xt name='1'>Moun| 00000090 74 20 53 79 6e 53 63 61 6e 3c 2f 6f 6e 65 54 65 |t SynScan</oneTe| 000000a0 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 6e 61 6d |xt>.<oneText nam| 000000b0 65 3d 27 32 27 3e 4d 6f 75 6e 74 20 53 79 6e 53 |e='2'>Mount SynS| 000000c0 63 61 6e 20 28 67 75 69 64 65 72 29 3c 2f 6f 6e |can (guider)</on| 000000d0 65 54 65 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 |eText>.<oneText | 000000e0 6e 61 6d 65 3d 27 33 27 3e 41 74 69 6b 20 31 36 |name='3'>Atik 16| 000000f0 32 30 30 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f 6e |200 @ sodoma</on| 00000100 65 54 65 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 |eText>.<oneText | 00000110 6e 61 6d 65 3d 27 34 27 3e 53 58 20 46 69 6c 74 |name='4'>SX Filt| 00000120 65 72 20 57 68 65 65 6c 20 23 30 31 30 31 30 34 |er Wheel #010104| 00000130 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f 6e 65 54 65 | @ sodoma</oneTe| 00000140 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 6e 61 6d |xt>.<oneText nam| 00000150 65 3d 27 35 27 3e 5a 57 4f 20 41 53 49 31 32 30 |e='5'>ZWO ASI120| 00000160 4d 4d 2d 53 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f |MM-S @ sodoma</o| 00000170 6e 65 54 65 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 |neText>.<oneText| 00000180 20 6e 61 6d 65 3d 27 36 27 3e 5a 57 4f 20 41 53 | name='6'>ZWO AS| 00000190 49 31 32 30 4d 4d 2d 53 20 28 67 75 69 64 65 72 |I120MM-S (guider| 000001a0 29 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f 6e 65 54 |) @ sodoma</oneT| 000001b0 65 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 6e 61 |ext>.<oneText na| 000001c0 6d 65 3d 27 37 27 3e 98 6c 26 67 74 3b 07 ab aa |me='7'>.l>...| 000001d0 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f 6e 65 54 65 | @ sodoma</oneTe| 000001e0 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 20 6e 61 6d |xt>.<oneText nam| 000001f0 65 3d 27 38 27 3e 98 6c 26 67 74 3b 07 ab aa 20 |e='8'>.l>... | 00000200 28 67 75 69 64 65 72 29 20 40 20 73 6f 64 6f 6d |(guider) @ sodom| 00000210 61 3c 2f 6f 6e 65 54 65 78 74 3e 0a 3c 6f 6e 65 |a.<one| 00000220 54 65 78 74 20 6e 61 6d 65 3d 27 39 27 3e 98 7c |Text name='9'>.|| 00000230 e2 dc aa aa 20 40 20 73 6f 64 6f 6d 61 3c 2f 6f |.... @ sodoma</o| 00000240 6e 65 54 65 78 74 3e 0a 3c 6f 6e 65 54 65 78 74 |neText>.<oneText| 00000250 20 6e 61 6d 65 3d 27 31 30 27 3e 98 7c e2 dc aa | name='10'>.|...| 00000260 aa 20 28 67 75 69 64 65 72 29 20 40 20 73 6f 64 |. (guider) @ sod| 00000270 6f 6d 61 3c 2f 6f 6e 65 54 65 78 74 3e 0a 3c 2f |oma.</| 00000280 6e 65 77 54 65 78 74 56 65 63 74 6f 72 3e 0a |newTextVector>.| 0000028f

Do you want me to test also with Valgrind and the broken file?

— Reply to this email directly, view it on GitHub https://github.com/indigo-astronomy/indigo/issues/531#issuecomment-2159022586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5EZBIQ4LTCTRF4JI6KWVDZGXVSBAVCNFSM6AAAAABGIGLXISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGAZDENJYGY . You are receiving this because you commented.Message ID: @.***>

Paolo97Gll commented 3 months ago

Yess!

I had to append the fake .log extension because GitHub do not permit files with custom extensions to be uploaded.

If you have problems let me know, I can also send you a zip file with both the config files.

polakovic commented 3 months ago

The only problem I found is garbage in the saved configuration :( Pls. try to remove both Aplaca_Agent.config files and we'll see if it will happen again.

That "(guider)" suffix in the garbled device name indicates it may be some broken camera driver which causes this problem, so either Atik or ASI.

Paolo97Gll commented 3 months ago

Yes, I did it yesterday and it worked, maybe you missed it in my reply!

However, deleting by hand the garbage entries from the config file solved the problem! I think that the file parser was crashing somewhere.

I sent you the hex dump and the files because I think that they can be useful for making the INDIGO parser more resilient to these types of errors since we don't know the origin of that garbage.

rumengb commented 3 months ago

You said you have some custom drivers. This garbage names may be because of unstable/work in progress guide camera or mount driver (both expose device with "guider" suffix). What are those custom drivers you mentioned above? Maybe they are causing it?! The thing is that ASCOM/ALPACA relays on a relatively unique device index/ID that does not change when devices come and go. If this index changes Alpaca should rediscover and re-index all the devices. This is why we store all devices in this config. This way we make sure old devices even not present at the moment will work without the need to rediscover them every time. So if you remove the config you will meed to remove old devices from the Alpaca client and rediscover them.

Paolo97Gll commented 3 months ago

I don't think that the problem is in the custom drivers (a SQM via REST API, a GPS simulator, and a PrimaLuceLab ESATTO via REST API). I've used them for 2 or 3 years without problems and only in the master server. The source of the garbage devices is in the remote server, which only uses official drivers.

rumengb commented 3 months ago

Well, What bothers me is that we do not know how these configs are created. And for sure it is the garbage names that are causing the web UI to constantly reconnect. Indigo itself properly escapes the symbols (see &gt in the device names). Bit not knowing the root cause bothers me.

Paolo97Gll commented 3 months ago

Here in north Italy we’ve had a bad weather since the end of April, so I haven’t used the telescope for almost six weeks. Furthermore, I don’t use the webpage too much. For sure, something happened in the last two month.

The only reasons I can think now are:

rumengb commented 3 months ago

Since we can not reproduce it I will close the issue. If you manage to reproduce it and find a way to reliably reproduce it Please repoen the issue.

Paolo97Gll commented 3 months ago

Sure!

Paolo97Gll commented 3 weeks ago

Hi, the problem happened again.

image

Alpaca_Agent.zip

I don't know when this happeed, since the weather was not good here in the last weeks. I can try to see if I can find any related log.

rumengb commented 2 weeks ago

I can not reproduce it... :( I tried hard... any pattern or anything that can narrow down the search?