mciupak / nrf24

nrf24l01 linux device driver
GNU General Public License v3.0
24 stars 13 forks source link

Armbian kernel 4.14 rc7 banana pi m2 #1

Closed sergey-suloev closed 6 years ago

sergey-suloev commented 6 years ago

Hello, I am testing your driver on my bananapi m2 Armbian kernel 4.14. It seems to load fine and I can see 5 devices in /dev/*. But dmesg shows a strange error, see below. My question: do you have any raspberry-specific code in your driver ? Any ideas why does this error happen ?

dmesg output

[ 7.943861] nrf24: loading out-of-tree module taints kernel. [ 7.948638] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(0) [ 7.948844] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(1) [ 7.949005] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(2) [ 7.949189] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(3) [ 7.949345] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(4) [ 7.949565] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(5) [ 7.956710] Unable to handle kernel NULL pointer dereference at virtual address 0000004d [ 7.956763] nrf24 nrf0: nrf24_tx_thread: waiting for new messages [ 7.966041] pgd = c0004000 [ 7.968749] [0000004d] *pgd=00000000 [ 7.972345] Internal error: Oops: 5 [#1] SMP THUMB2 [ 7.977220] Modules linked in: sun6i_drc nrf24(O) sun4i_codec snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer sun4i_ts snd soundcore sun4i_gpadc sun4i_ss sun4i_tcon uio_pdrv_genirq cpufreq_dt uio thermal_sys bonding brcmfmac brcmutil cfg80211 rfkill ip_tables x_tables i2c_sun6i_p2wi pwrseq_simple realtek [ 8.004281] CPU: 3 PID: 379 Comm: nrf0_rx_thread Tainted: G O 4.14.0-rc7-sunxi #3 [ 8.012889] Hardware name: Allwinner sun6i (A31) Family [ 8.018112] task: ee3da700 task.stack: ed89e000 [ 8.022642] PC is at __dev_printk+0x10/0x60 [ 8.026822] LR is at dev_printk+0x21/0x2c [ 8.030825] pc : [] lr : [] psr: 20010033 [ 8.037082] sp : ed89fee0 ip : ed89e03c fp : 00000000 [ 8.042299] r10: ef401860 r9 : ef3f1860 r8 : 00000000 [ 8.047515] r7 : ed8f0218 r6 : ed8f0200 r5 : bfa3d438 r4 : ed89ff00 [ 8.054032] r3 : ed89ff1c r2 : ed89ff00 r1 : 00000001 r0 : bfa3d438 [ 8.060551] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none [ 8.067849] Control: 50c5387d Table: 6e21806a DAC: 00000051 [ 8.073587] Process nrf0_rx_thread (pid: 379, stack limit = 0xed89e210) [ 8.080190] Stack: (0xed89fee0 to 0xed8a0000) [ 8.084546] fee0: ed8e0000 00600011 00000006 bfa3d654 ed89fefc c0545e29 ee2fbca8 ed89ff1c [ 8.092715] ff00: bfa3d654 ed89fefc ed8e0000 ed8e0000 00000000 bfa3a93b bfa3d654 00000000 [ 8.100883] ff20: 00000000 bfa3d668 00000000 00000003 00000000 c0149d0f ef0f0000 00000000 [ 8.109050] ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 8.117218] ff60: ed89ff78 ee357400 ee3573c0 00000000 ed89e000 ed8e0000 bfa3a73d ee2fbca8 [ 8.125386] ff80: ee35741c c012fa2d ffffffff ee3573c0 c012f931 00000000 00000000 00000000 [ 8.133553] ffa0: 00000000 00000000 00000000 c0106589 00000000 00000000 00000000 00000000 [ 8.141721] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 8.149889] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 6fffd861 6fffdc61 [ 8.158070] [] (__dev_printk) from [] (dev_printk+0x21/0x2c) [ 8.165479] [] (dev_printk) from [] (nrf24_rx_thread+0x1fe/0x244 [nrf24]) [ 8.174026] [] (nrf24_rx_thread [nrf24]) from [] (kthread+0xfd/0x104) [ 8.182201] [] (kthread) from [] (ret_from_fork+0x11/0x28) [ 8.189419] Code: eb04 4605 4614 b1e9 (6ccb) 7840 [ 8.194301] ---[ end trace e13b3e22e6f9c21d ]---


My overlay:

/dts-v1/; /plugin/;

/ { compatible = "allwinner,sun6i-a31s";

fragment@0 {
            target = <&pio>;
            __overlay__ {
                    nrf1_pins: nrf1_pins {
                            pins = "PE7", "PE6";    /* pins 38,36 on 40-pin GPIO connector */
                            function = "gpio_in", "gpio_out";
                    };
            };
    };

fragment@1 {
    target = <&spi1>;
    __overlay__ {
        #address-cells = <1>;
        #size-cells = <0>;
        status = "okay";

        nrf1: nrf1@0 {
            compatible = "nordic,nrf24";
            reg = <0>; /* CS0 */
            pinctrl-names = "default";
            pinctrl-0 = <&nrf1_pins>;
            #interrupt-cells = <2>;
            interrupt-parent = <&pio>;
            interrupts = <4 7 2>;       /* PE7, falling edge */
            irq-gpio = <&pio 4 7 0>;
            ce-gpio = <&pio 4 6 0>;     /* PE6 */
            spi-max-frequency = <5000000>;
            status = "okay";
        };
    };
};
  };

No errors happen during echo'ing text into the devices.

mciupak commented 6 years ago

Hello Sergey,

thanks for the ticket. I did not add any raspberry pi specific code (except of using raspberry pi kernel).

First dev_printk occurrence from nrf24_rx_thread is called in nrf24_read_reg (called via nrf24_is_rx_fifo_empty)

Could you please check if spi pointer is NULL there? If it is not NULL, check what is in spi->dev (as for example the name of device is taken from there).

Thanks, Marcin

P.S. I am sorry for late answer, but I did not receive any email notification about that.

sergey-suloev commented 6 years ago

Hello Marcin, thanks for the reply. I will check what you said and report. It seems like the memory is totally corrupted in those receive/transmit threads. I am just curious why you didn't use UART strategy instead of those 2 threads..

I tried a couple more drivers similar to yours but all was unsuccessful. So I am still interested in making it work.

mciupak commented 6 years ago

So, maybe the problem is in the layer below i.e. spi driver?

What do you mean by "the memory is totally corrupted in those receive/transmit threads" ?

I am sure there is a lot of bugs there, so any hints are welcome!

Marcin

sergey-suloev commented 6 years ago

It may be too... Those sunxi devices are not well supported. "Memory corrupted" meaning that many pointers there are pointing to some strange data. I will give you more info later. Didn't you think about using UART driver approach? I.e. implement UART driver based on RF stuff. I tried also this project https://github.com/melomaa/nrf24l01 , it is not working either on my hardware, but at least it is not crashing ...

sergey-suloev commented 6 years ago

Hello Marcin I have some news here. I found an issue in rx_thread function. https://github.com/mciupak/nrf24/blob/a9f800a16653769e48c9fda3ab8a160b818a08e3/nrf24_if.c#L1083 As soon as N_NRF24_PIPES == 6 the condition in this line is not working correctly. Documentation says that "pipe" value can only be 0-5 but I am getting the "pipe" == 6 on my hardware and condition therefore allows the code to step below. As a result device->pipes[pipe] will access uninitialized memory.

It seems like condition should look like the following:

if (pipe > NRF24_PIPE5) { ... }

But when I change it the way above then my board will hang at startup.

mciupak commented 6 years ago

Hello

this "pipe" in this context is a value of filed RX_P_NO of STATUS register.

Documentation says: 000-101: Data Pipe Number 110: Not Used 111: RX FIFO Empty

Not sure why you are getting pipe == 6 which is "Not Used". I have not observed such behavior.

This condition: https://github.com/mciupak/nrf24/blob/a9f800a16653769e48c9fda3ab8a160b818a08e3/nrf24_if.c#L1083 is to prohibit reading of data when actually there is not data to be read, meaning RX FIFO Empty is set in STATUS register. However it does not protect against 110: Not used.

Your change is correct, but I would write it like following: if (pipe >= N_NRF24_PIPES) {

But when I change it the way above then my board will hang at startup.

What do you mean by that? What are the symptoms?

BTW, did you solve the issue with crashing of nrf24_rx_thread?

Marcin

sergey-suloev commented 6 years ago

Yes, there is no crash anymore because Linux doesn't boot completely with that change.

sergey-suloev commented 6 years ago

Console says "can't start journal service".. seems like the loop in rx_thread is executed intensively without waiting because NRF chip says that FIFO is not empty but "pipe" == 6 and as soon as I changed the condition there is no crash anymore and thread is executing the loop over and over. I have no idea why NRF is returning 6 for pipe... That may mean a wrong configuration or the chip itself is broken. Here is my hardware https://www.aliexpress.com/item/Free-Shipping-NRF24L01-PA-LNA-Wireless-Module-with-Antenna-1000-Meters-Long-Distance-FZ0410-We-are/32834661886.html?spm=a2g0s.9042311.0.0.w9m9ec

mciupak commented 6 years ago

Comment out starting of rx and tx thread.

If your system boot up, try to read STATUS register via: cat /sys/class/nrf0/status (or sth like that)

You will see if SPI communication with your module works. There should be output on the console and in dmesg.

Paste output here please.

Marcin

sergey-suloev commented 6 years ago

root@aapi-bzdvwv:~# cat /sys/class/nrf24/nrf0/status STATUS = 0x1C

dmesg says:

[ 100.133599] nrf24 spi32766.0: nrf24_print_status: CONFIG = 0xBC [ 100.133688] nrf24 spi32766.0: nrf24_print_status: EN_AA = 0x54 [ 100.133747] nrf24 spi32766.0: nrf24_print_status: EN_RXADDR = 0x7E [ 100.133801] nrf24 spi32766.0: nrf24_print_status: SETUP_AW = 0x06 [ 100.133852] nrf24 spi32766.0: nrf24_print_status: SETUP_RETR = 0xFD [ 100.133904] nrf24 spi32766.0: nrf24_print_status: RF_CH = 0x04 [ 100.133955] nrf24 spi32766.0: nrf24_print_status: RF_SETUP = 0x28 [ 100.134007] nrf24 spi32766.0: nrf24_print_status: STATUS = 0x1C [ 100.134058] nrf24 spi32766.0: nrf24_print_status: OBSERVE_TX = 0x00 [ 100.134109] nrf24 spi32766.0: nrf24_print_status: CD = 0x00 [ 100.134160] nrf24 spi32766.0: nrf24_print_status: FIFO_STATUS = 0x22 [ 100.134212] nrf24 spi32766.0: nrf24_print_status: DYNPD = 0x54 [ 100.134263] nrf24 spi32766.0: nrf24_print_status: FEATURE = 0x08

sergey-suloev commented 6 years ago

Here interesting thing happens

root@aapi-bzdvwv:~# cat /sys/class/nrf24/nrf0/available_address_width 3 4 5

sergey-suloev commented 6 years ago

I have one more question. Documentation says that NRF should respond with status byte first when you read any register. But in your code I can't see that you are skipping the 1st byte. Why ?

mciupak commented 6 years ago

Reading address width. All good here. 3 4 5 means that you can configure module to use 3, 4 or 5 bytes long addresses.

Regarding STATUS register and 1st byte. Registers are read using spi_w8r8 function which first writes 8 bits and then reads 8 bits while STATUS is being clocked out while first 8 bit are written. That is why it is skipped automatically.

Now, the most important, regarding values of STATUS and other registers. It looks that all your registers values read from module are shifted left by 1 bit. Eg.:

That is the source of your problems. Not sure how to help you now...

sergey-suloev commented 6 years ago

you are saying that all the registers shifted by 1 bit ?? how is that possible ? why do you think this may happen ? is there a way to understand if this is a SPI driver issue or hardware issue ?

mciupak commented 6 years ago

Hard to say why it is like that. Try lowering your SPI frequency first to 1 MHz or even lower like 100 kHz and try then.

If if will not help, check spi mode config. I believe it should be SPI_MODE_0, but you could try different ones. There could be also some driver specific config missing. Really hard to guess.

sergey-suloev commented 6 years ago

screenshot from 2017-11-15 22-10-04

Here is STATUS request. Seems like hardware part is fine?

mciupak commented 6 years ago

Hmm, why clk is clocked out only 8 times? Is is write and then read or just read?

sergey-suloev commented 6 years ago

This is only reading STATUS register. There is another byte following. Looks like master is sending 0x7 and then is staying at low level. Slave is responding with 0xE and again with 0xE. Both values are STATUS register. screenshot from 2017-11-15 22-47-00

sergey-suloev commented 6 years ago

Here is the whole picture for reading STATUS register screenshot from 2017-11-15 22-50-46

mciupak commented 6 years ago

Yes, hardware part looks fine. 0x0E id "correct" value of STATUS register. It means that RX Fifo is empty.

And you confirm that your status is not 0x0E while displaying it in driver after that read you showed?

sergey-suloev commented 6 years ago

The SPI driver gives me 0x1C. So the problem should be in sun6i SPI driver code.

mciupak commented 6 years ago

One more thing. Have you tried lowering SPI frequency? Screenshots you attached are form logic state analyzer and it does not show how plots actually looks like.

Give it a try with 1 MHz or lower.

Marcin

sergey-suloev commented 6 years ago

I tried it, no difference. I did debugging in spi-sun6i.c and found something that looks too bad: readb function here http://elixir.free-electrons.com/linux/v4.13.12/source/drivers/spi/spi-sun6i.c#L155 returns already broken value... That seems like hardware issue.. but logic analyzer shows a good picture. I am really confused.

sergey-suloev commented 6 years ago

I tried another device, OrangePi PC (Alwinner H3 Soc) and got with the same problem with SPI... that's so weird!

sergey-suloev commented 6 years ago

Ok, I have tested communication using spi-gpio driver (bitbang/software spi emulation) and found that it is working fine. [ 240.738204] nrf24 spi32766.0: nrf24_print_status: CONFIG = 0x0F [ 240.738252] nrf24 spi32766.0: nrf24_print_status: EN_AA = 0x3F [ 240.738296] nrf24 spi32766.0: nrf24_print_status: EN_RXADDR = 0x3F [ 240.738339] nrf24 spi32766.0: nrf24_print_status: SETUP_AW = 0x03 [ 240.738382] nrf24 spi32766.0: nrf24_print_status: SETUP_RETR = 0xFF [ 240.738426] nrf24 spi32766.0: nrf24_print_status: RF_CH = 0x02 [ 240.738468] nrf24 spi32766.0: nrf24_print_status: RF_SETUP = 0x07 [ 240.738511] nrf24 spi32766.0: nrf24_print_status: STATUS = 0x0E [ 240.738554] nrf24 spi32766.0: nrf24_print_status: OBSERVE_TX = 0x00 [ 240.738597] nrf24 spi32766.0: nrf24_print_status: CD = 0x00 [ 240.738640] nrf24 spi32766.0: nrf24_print_status: FIFO_STATUS = 0x11 [ 240.738683] nrf24 spi32766.0: nrf24_print_status: DYNPD = 0x3F [ 240.738727] nrf24 spi32766.0: nrf24_print_status: FEATURE = 0x04

sergey-suloev commented 6 years ago

I am closing the issue as the problem described seems to be hardware-originated. Thanks for assistance.