matthijskooijman / arduino-lmic

:warning: This library is deprecated, see the README for alternatives.
704 stars 651 forks source link

Debugging: can't get LMIC_PRINTF_TO working on SAMD21 #270

Open JeroenvIS opened 4 years ago

JeroenvIS commented 4 years ago

Base question/issue: I can't get LMIC_PRINTF_TO working on SAMD21. Tried with Arduino 1.8.6 and 1.8.12 on Windows, both with current LMIC. When I uncomment #define LMIC_PRINTF_TO Serial and set LMIC_DEBUG_LEVEL to 2, compilation of the sketch fails with the error error: 'cookie_io_functions_t' does not name a type

I'm lost how to fix this, it's way more low-level than I'm used to. Any pointers?

Background / reason for the debug: I'm running into an issue with the "Sensebox MCU", which is a SAMD21 based board. Node can join TTN successfully (OTAA) and usually transmits a couple of packets just fine (eg with a TX_INTERVAL of 60 seconds, rescheduling the os_job 60 secs after TX_COMPLETE as in common examples).

However, after a while, I don't see any data being received at TTN. The loop() is still running (observed by blinking LED) and with serial debug output for the sketch itself, I can see that the send_job has prepared the data just fine and emitted the "Packet queued" message. I never see a TX_COMPLETE event though. If I add some code to reschedule the os_send job if no TX_COMPLETE has been seen withing 15 secs after queueing the packet (only after join of course), it appears that LMIC.opmode & OP_TXRXPEND yields true.

I haven't had issues like these on AVR, but I see this pattern on 4 out of 5 Sensebox MCU platforms and I believe that a working LMIC_PRINTF_TO might help me dig deeper.

matthijskooijman commented 4 years ago

Hmm, I have no problems compiling the ttn-otaa example for the Arduino Zero with #define LMIC_PRINTF_TO Serial.

Using board 'arduino_zero_edbg' from platform in folder: /home/matthijs/.arduino15/packages/arduino/hardware/samd/1.6.17
Using core 'arduino' from platform in folder: /home/matthijs/.arduino15/packages/arduino/hardware/samd/1.6.17

Are you using the latest version (1.5.0+arduino3)? It was recently fixed to support non-AVR printf as well (though your error suggests you already have the fix, or maybe you're usign the old non-avr-printf (or something like that) branch? Note that the IDE library manager seems to mis-sort the versions and tends to install an older version.

What board and core are you using exactly? This is printed by the IDE on (verbose) compilation as I pasted above.

(as for the underlying issue - I have no suggestions from the top of my head, other than indeed enabling debug output :-p)

JeroenvIS commented 4 years ago

The board that I'm using is the "SenseBox MCU" from SenseBox , with this board support package.

Yes, I'm using LMIC version 1.5.0+arduino3. There is an older LMIC library included with the SenseBox packages, so I emptied that directory and copied the files for 1.5.0+arduino3 to that location. Probably not the cleanest approach, but (also looking at the verbose output) I'm 100% sure that it picks up the latest LMIC code this way.

Could it be that the BSP needs some kind of patch/update to make this work?

(as for the underlying issue: I've never seen this before, neither with AVR nor with ESP32 platforms. However, a faculty member who is involved in this project e-mailed me that he also observed this issue on ESP32+RFM95 several times. Anecdotical, but I would like to investigate further).

JeroenvIS commented 4 years ago

Using board 'sb' from platform in folder: C:\Users\ingenj\AppData\Local\Arduino15\packages\sensebox\hardware\samd\1.3.1

Using core 'arduino' from platform in folder: C:\Users\ingenj\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.5

matthijskooijman commented 4 years ago

No time to reproduce right now, but looking at the install instructions for this board, it looks like it just uses the default Aduino SAMD core with some added boards in its boards.txt (confirmed by the lack of a cores directory in their repo). Also, just realized, confirmed by the output in your last comment which indeed says it uses the Arduino SAMD core.

I think these printf cookie functions are supplied by gcc/libc, so that should just work.

I do notice that I have an older Arduino SAMD version (1.6.17, whereas you're using 1.8.5). I'll try upgrading later, to see if that works (or you could try downgrading just the Arduino SAMD core, see if that fixes it)?

JeroenvIS commented 4 years ago

I do notice that I have an older Arduino SAMD version (1.6.17, whereas you're using 1.8.5). I'll try upgrading later, to see if that works (or you could try downgrading just the Arduino SAMD core, see if that fixes it)?

That's a good pointer, thanks! Yes, it does compile correctly when I downgrade Arduino SAMD support to 1.6.17. Will try to bisect where the breakage is introduced.

In the mean time I compiled with the 1.6.17 so I can run a node with extensive debugging.

matthijskooijman commented 4 years ago

That's a good pointer, thanks! Yes, it does compile correctly when I downgrade Arduino SAMD support to 1.6.17. Will try to bisect where the breakage is introduced.

Nice, thanks for testing. I can imagine two underlying causes then:

I'll explain the latter a bit more. In the hal.cpp file: https://github.com/matthijskooijman/arduino-lmic/blob/54bc51df00de18d7dd236fafac6b48e2597957f1/src/hal/hal.cpp#L11-L17

We include stdio.h with _GNU_SOURCE defined to enable some GNU-specific extension. However, we include Arduino.h (and some other files) first, so that means if Arduino.h (or the other files) would (indirectly) include stdio.h, that file would be included without _GNU_SOURCE defined, so it would not expose the cookie functions we need (and the later include in hal.cpp would end up being a no-op because stdio.h was previously included).

To test this, you could try moving these three lines to above the Arduino.h include:

https://github.com/matthijskooijman/arduino-lmic/blob/54bc51df00de18d7dd236fafac6b48e2597957f1/src/hal/hal.cpp#L15-L17

That should ensure that stdio.h is always included properly, regardless of what other headers do. If this is indeed the problem, that should be fixed by making this change (and even if it does not, I think I will make this change since it should prevent this problem from being introduced later).

JeroenvIS commented 4 years ago

Great! I'll try that in a bit. First update: without any other code changes, Arduino SAMD version 1.6.21 (last 1.6) works fine too; 1.8.3 and 1.8.5 don't. However, with your proposed fix, 1.8.5 works fine. So sounds like a good solution!

And from the debug output to narrow down the TX issues: everything is OK when "Packet queued" is followed by "irq: dio: 0x0 flags: 0x8", whereas the cases where "Packet queued" is never followed by a "EV_TXCOMPLETE" event (and instead my workaround code has to kick in) are missing that irq.

OK:

26378997: Updating info for TX at 26377875, airtime will be 3856. Setting available time for band 0 to 30233875
26379221: TXMODE, freq=867300000, len=23, SF=7, BW=125, CR=4/5, IH=0
Packet queued
26382967: irq: dio: 0x0 flags: 0x8
26383030: Scheduled job 0x20001470, cb 0x370d ASAP

Not OK:

25160109: Updating info for TX at 25158684, airtime will be 3856. Setting available time for band 1 to 25544284
25160349: TXMODE, freq=868300000, len=23, SF=7, BW=125, CR=4/5, IH=0
Packet queued
No TX_COMPLETE event received after scheduling transmission, rescheduling
26125011: engineUpdate, opmode=0x800
matthijskooijman commented 4 years ago

And from the debug output to narrow down the TX issues: everything is OK when "Packet queued" is followed by "irq: dio: 0x0 flags: 0x8", whereas the cases where "Packet queued" is never followed by a "EV_TXCOMPLETE" event

Sounds like somehow the IRQ is never fired (not that there is no actually interrupt line involved, the code just polls the DIO "interrupt" lines using digitalRead()). Could it be that you stop calling os_runloop_once() in the broken case? If not, maybe add some debug printing to the GPIO polling (or, if have it available, use a scope or logic analyzer to see if the DIO pin actually becomes active). From the debug output dio: 0x0, I suspect you should be looking at DIO0, though I'm not entirely sure.

JeroenvIS commented 4 years ago

The code is still calling os_runloop_once() in the loop, that's not different from the non-broken case. There is no complex code in the loop (just blinking a LED twice a second, by driving an output based on int(millis() / 500) % 2. Just to make sure I'll remove that bit and see if I can still reproduce then. I have a small logic analyzer, so if more debugging doesn't help, I'll consider monitoring DIO0 and possibly a few other signals to correlate.

JeroenvIS commented 4 years ago

@matthijskooijman I committed your suggested change in my fork (https://github.com/JeroenvIS/arduino-lmic/commit/42f4eb01a1b4895faf74d045484aaad25058f016); it compiles cleanly both with and without #define LMIC_PRINTF_TO Serial, even with Arduino core 1.8.5.

So I think this is a perfect fix and can be merged or otherwise adopted in your branch. When that's done, I'll urge Sensebox to update the version that they are shipping with their package.

JeroenvIS commented 4 years ago

There is no complex code in the loop (just blinking a LED twice a second, by driving an output based on int(millis() / 500) % 2. Just to make sure I'll remove that bit and see if I can still reproduce then.

FWIW: this was indeed the issue, introducing a certain probability that EV_TXCOMPLETE didn't come after queueing a packet. If I remove this line from the loop, LMIC os_runloop_once gets the 'interrupt' every time.

Can't explain from a hardware and/or timing perspective yet, but I guess that's not too relevant for this PR anyway.

Commifreak commented 3 years ago

@matthijskooijman I committed your suggested change in my fork (JeroenvIS@42f4eb0); it compiles cleanly both with and without #define LMIC_PRINTF_TO Serial, even with Arduino core 1.8.5.

So I think this is a perfect fix and can be merged or otherwise adopted in your branch. When that's done, I'll urge Sensebox to update the version that they are shipping with their package.

This fix worked! I had to add these #define _GNU_SOURCE 1 // For fopencookie since they're not there in my hal.cpp?

When will this fix being merged into https://github.com/mcci-catena/arduino-lmic ?

EDIT: Maybe never, since it is not a wanted fix: https://github.com/mcci-catena/arduino-lmic/issues/549