basic-station stuck at Station device: /dev/lora (PPS capture disabled)

SCordibella commented 1 year ago

Hi All, I am trying to work with an nFuse picocell gateway over serial (https://www.n-fuse.co/devices/LoRaWAN-Concentrator-Card-mini-PCIe.html). I am able to communicate with the concentrator, but it seems to not start properly, even after many times (minutes). Here is the full log of a basic station startup: basic-station.log

After the last line in the previous log [RAL:INFO] Station device: /dev/lora (PPS capture disabled) I expect to see something like [RAL:INFO] Concentrator started (15s491ms)

Any help will be appreciated, Stefano.

beitler commented 1 year ago

Hi Stefano,

I'd love to help you get unstuck on this issue. However, this is a vendor-specific question and hence I can only speculate what is going on here. I suggest you reach out to the gateway vendor directly.

You run a smtcpico build of basics station. This means, the communication with the radio board is done via a UART interface connecting to an MCU which bridges the communication to the sx1308 concentrator over SPI. When station blocks like this, it's most probably because the HAL is waiting on a response on the UART interface. In this part of the startup process the HAL expects a response to lgw_start which kicks off a configuration and calibration process for the front-end. So, the question is, why lgw_start is not returning.

SCordibella commented 1 year ago

Thank you @beitler for your hint. I will ask the vendor, even if they suggest to use the picogw packet forwarder. Looking for a solution to my problem I see many cases in which after the [RAL:INFO] Station device: /dev/lora (PPS capture disabled) there are errors about lgw_start, but I didn't check the code. I update the HAL library to v0.2.3, using the same patch from the deps folder, but nothing changes. If I reduce the time in the reset script I have the following log: basic-station-loop.log Finally the basic station runs on the same hardware that already worked with a different BSP, we update kernel from 4.14.126 to 6.1.22 and yocto from jethro to dunfell.

SCordibella commented 1 year ago

Looking deeper in the code I see that the basic station stops when it try to load the firmware, more precisely the load_firmware never return from the read after a firmware write.

So I try to test the communication using the picoGW-hal communication utility and this is the result:

root@boxio-00142D62FCF9:~/picoGW_hal-master/util_com_stress# ./util_com_stress -d /dev/lora -t 4
INFO: Starting LoRa concentrator SPI stress-test number 4
Cycle 0 > did a 6-bytes R/W on a data buffer with no error
Cycle 1 > did a 7-bytes R/W on a data buffer with no error
Cycle 2 > did a 8-bytes R/W on a data buffer with no error
Cycle 3 > did a 9-bytes R/W on a data buffer with no error
Cycle 4 > did a 10-bytes R/W on a data buffer with no error
Cycle 5 > did a 11-bytes R/W on a data buffer with no error
Cycle 6 > did a 12-bytes R/W on a data buffer with no error
Cycle 7 > did a 13-bytes R/W on a data buffer with no error
Cycle 8 > did a 14-bytes R/W on a data buffer with no error
Cycle 9 > did a 15-bytes R/W on a data buffer with no error
Cycle 10 > did a 16-bytes R/W on a data buffer with no error
Cycle 11 > did a 17-bytes R/W on a data buffer with no error
Cycle 12 > did a 18-bytes R/W on a data buffer with no error
Cycle 13 > did a 19-bytes R/W on a data buffer with no error
Cycle 14 > did a 20-bytes R/W on a data buffer with no error
Cycle 15 > did a 21-bytes R/W on a data buffer with no error
Cycle 16 > did a 22-bytes R/W on a data buffer with no error
Cycle 17 > did a 23-bytes R/W on a data buffer with no error
Cycle 18 > did a 24-bytes R/W on a data buffer with no error
Cycle 19 > did a 25-bytes R/W on a data buffer with no error
Cycle 20 > did a 26-bytes R/W on a data buffer with no error
Cycle 21 > did a 27-bytes R/W on a data buffer with no error
Cycle 22 > did a 28-bytes R/W on a data buffer with no error
^CCycle 23 > error during the buffer comparison
Written values:
 5A  8A  3F  62  80  29  44  DE  7C  A5  89  4E  57  59  D3  51 
 AD  AC  86  95  80  EC  17  E4  85  F1  8C  0C  66 
Read values:
 8B  82  4A  79  9E  CB  F1  7E  D2  D3  D7  7E  D2  5F  7F  79 
 78  C4  B2  2F  FE  88  7F  0B  DD  6A  17  A7  AE 
^CRe-read values:
 8B  82  4A  79  9E  CB  F1  7E  D2  D3  D7  7E  D2  5F  7F  79 
 78  C4  B2  2F  FE  88  7F  0B  DD  6A  17  A7  AE

it stops to work just after 28 bytes...

Have you ever seen a similar situation?

Best regards, Stefano.

beitler commented 1 year ago

I can't say much more than that this definitely points at a communication issue between the host and the radio chip. This is highly hardware specific and should be investigated by the manufacturer. As you are using a HAL tool to trigger the error condition, there is nothing Basics Station-specific which is triggering this.

Since you have made major updates to the base system, kernel, drivers, etc. this error condition could be a consequence of an incompatible change of one of these base system components with respect to how the HAL is driving the communication. I'm sorry I can't be of great help with this issue.

SCordibella commented 1 year ago

I agree with you @beitler , I will test the serial communication then ask the hardware supplier to exclude a low level issue, then I will ask Semtech.

lorabasics / basicstation

basic-station stuck at Station device: /dev/lora (PPS capture disabled) #184