imfatant / test

64 stars 22 forks source link

Unhandled fault: external abort on non-linefetch (0x1818) #15

Closed hagfelsh closed 5 years ago

hagfelsh commented 5 years ago

First, I must say: your guide is fantastic and thank you so much for keeping it easy to read and follow. Outstanding work! Oh, and thank you so much for continually returning to this page to post a fresh run. Thank you for taking the time, I'm a new BB user and I'm very grateful.

I'm by no means new to linux, however, so this puzzle has me scratching my head.

I've run through the guide three times, each with failures, but with different failure styles.

The latest, which provoked me to seek your help is this:

beaglebone systemd[1]: Starting ArduPlane Service...
beaglebone systemd[1]: Started ArduPlane Service.
beaglebone aphw[16011]: /bin/echo: write error: Operation not permitted
beaglebone aphw[16011]: /usr/bin/ardupilot/aphw: line 6: /sys/class/gpio/gpio80/direction: No such file or directory
beaglebone aphw[16011]: /usr/bin/ardupilot/aphw: line 7: /sys/class/gpio/gpio80/value: No such file or directory
beaglebone kernel: [  501.600266] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f4f000
beaglebone kernel: [  501.600276] pgd = (ptrval)
beaglebone kernel: [  501.600280] [b6f4f000] *pgd=9c595831, *pte=4a324343, *ppte=4a324833
beaglebone arduplane[8975]: RCOutputAioPRU.cpp:SIGBUS error generated
beaglebone systemd[1]: arduplane.service: Main process exited, code=exited, status=1/FAILURE
beaglebone systemd[1]: arduplane.service: Unit entered failed state.
beaglebone systemd[1]: arduplane.service: Failed with result 'exit-code'.

This particular instance of arduplane was compiled on the board itself using the bone-debian-9.8-console-armhf-2019-03-03-1gb image. (Previous to that I tried your latest posting but found trouble that I also find here.)

The aphw lines exit with 0 if I run them by hand with sudo or as root. If I run the script via sudo or as root, I get the same messaging seen in syslog above.

However I'd like to run ardupilot I get the external abort message seen above. This run of APM was actually compiled on the bbone... took two hours! Previous compilations were: 1st @ Ubuntu 18.04, 2nd @ Ubuntu 16.04.

I've followed every step in the guide; even wrote a script to automate it! I suspect I'm running into something I'm overlooking since your builds are succeeding. I'm happy to offer whatever you'd find useful if you can spare the time to help. I've made an image of the SD card that I can share however you'd like, if that appeals to you.

Thank you for your time!

imfatant commented 5 years ago

Hi,

The aphw file is failing to execute successfully. The ArduPlane service file attempts to execute aphw before it executes ArduPlane itself. If aphw fails, it won't bother to execute ArduPlane.

It's odd that aphw should fail if you've faithfully followed the guide. We have, however, experienced some anomalous behaviour over the last year, partly because the real-time kernel keeps getting broken (bloody annoying). Anyway, try the following and let me know if ArduPlane starts (i.e. the red LED begins to flash):

sudo systemctl disable arduplane.service sudo systemctl stop arduplane.service sudo /usr/bin/ardupilot/arduplane

-- Imf

hagfelsh commented 5 years ago

Alrighty! Same problem but this time I have stuff to look at.

Here are some environmental details:

  Operating System: Debian GNU/Linux 9 (stretch)
            Kernel: Linux 4.19.29-bone-rt-r29
      Architecture: arm
debian@beaglebone:~$ cat /usr/bin/ardupilot/aphw
#!/bin/bash
# aphw
# ArduPilot hardware configuration.

/bin/echo 80 >/sys/class/gpio/export
/bin/echo out >/sys/class/gpio/gpio80/direction
/bin/echo 1 >/sys/class/gpio/gpio80/value
/bin/echo pruecapin_pu >/sys/devices/platform/ocp/ocp:P8_15_pinmux/state

Running each line of aphw via sudo offered this: debian@beaglebone:~$ sudo /bin/echo 80 >/sys/class/gpio/export /bin/echo: write error: Operation not permitted

So I ran each line by hand as root:

debian@beaglebone:~$ sudo -s
root@beaglebone:/home/debian# /bin/echo 80 >/sys/class/gpio/export
root@beaglebone:/home/debian# /bin/echo out >/sys/class/gpio/gpio80/direction
root@beaglebone:/home/debian# /bin/echo 1 >/sys/class/gpio/gpio80/value
root@beaglebone:/home/debian# /bin/echo pruecapin_pu >/sys/devices/platform/ocp/ocp:P8_15_pinmux/state

Here is what's in /sys/class/gpio now:

root@beaglebone:/sys/class/gpio# ll
total 0
--w--w---- 1 root gpio 4.0K Mar 27 02:48 export
lrwxrwxrwx 1 root root    0 Mar 27 02:59 gpio105 -> ../../devices/platform/ocp/481ae000.gpio/gpiochip3/gpio/gpio105
lrwxrwxrwx 1 root root    0 Mar 27 02:59 gpio5 -> ../../devices/platform/ocp/44e07000.gpio/gpiochip0/gpio/gpio5
lrwxrwxrwx 1 root root    0 Mar 27 02:59 gpio65 -> ../../devices/platform/ocp/481ac000.gpio/gpiochip2/gpio/gpio65
lrwxrwxrwx 1 root root    0 Mar 27 02:48 gpio80 -> ../../devices/platform/ocp/481ac000.gpio/gpiochip2/gpio/gpio80
lrwxrwxrwx 1 root gpio    0 Mar 26 03:46 gpiochip0 -> ../../devices/platform/ocp/44e07000.gpio/gpio/gpiochip0
lrwxrwxrwx 1 root gpio    0 Mar 26 03:46 gpiochip32 -> ../../devices/platform/ocp/4804c000.gpio/gpio/gpiochip32
lrwxrwxrwx 1 root gpio    0 Mar 26 03:46 gpiochip64 -> ../../devices/platform/ocp/481ac000.gpio/gpio/gpiochip64
lrwxrwxrwx 1 root gpio    0 Mar 26 03:46 gpiochip96 -> ../../devices/platform/ocp/481ae000.gpio/gpio/gpiochip96
--w--w---- 1 root gpio 4.0K Mar 26 03:46 unexport

A sysfs dump of gpio80 seems to match what we asked of it: gpio-sysfsutil.txt

  Class Device = "gpio80"
  Class Device path = "/sys/devices/platform/ocp/481ac000.gpio/gpiochip2/gpio/gpio80"
    active_low          = "0"
    direction           = "out"
    edge                = "none"
    label               = "sysfs"
    uevent              =
    value               = "1"

    Device = "gpiochip2"
    Device path = "/sys/devices/platform/ocp/481ac000.gpio/gpiochip2"
      dev                 = "254:2"
      uevent              = "MAJOR=254
MINOR=2
DEVNAME=gpiochip2
OF_NAME=gpio
OF_FULLNAME=/ocp/gpio@481ac000
OF_COMPATIBLE_0=ti,omap4-gpio
OF_COMPATIBLE_N=1"

Next, I ran arduplane through strace. The whole file is attached, but here are the highlights: arduplane.strace.txt stdout has this for us again: RCOutputAioPRU.cpp:SIGBUS error generated

This was left in dmesg:

[b6fc7000] *pgd=9ad9e831, *pte=4a324343, *ppte=4a324833
Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6ef400
pgd = a8d42a6c
[b6ef4000] *pgd=9b288831, *pte=4a324343, *ppte=4a324833

It seems it's just as you say--the gpio is locked by another process or it doesn't exist.

7382  02:49:54.704123 open("/sys/devices/ocp.3/pwm_test_P8_36.12/period", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
7382  02:49:54.704662 open("/sys/devices/ocp.3/pwm_test_P8_36.12/duty", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
7382  02:49:54.704976 open("/sys/devices/ocp.3/pwm_test_P8_36.12/run", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It's worth noting that I have not connected anything to the board yet--no receivers or servos or the like.

hagfelsh commented 5 years ago

I wonder if gpio80 doesn't physically exist? Could it be a different name? Is there a datasheet hidden out there somewhere that identifies the names of the ports as they would be enumerated by debian?

hagfelsh commented 5 years ago

Looks like this might be the bb blue pinout table: https://github.com/beagleboard/beaglebone-blue/blob/master/BeagleBone_Blue_Pin_Table.csv

imfatant commented 5 years ago

There are a couple of things to try, and I'll be updating the guide accordingly tonight or tomorrow. Here's a preview:

1) In /boot/uEnv.txt: Ensure that the line uboot_overlay_pru=/lib/firmware/AM335X-PRU-RPROC-4-14-TI-00A0.dtbo is commented out thus: #uboot_overlay_pru=/lib/firmware/AM335X-PRU-RPROC-4-14-TI-00A0.dtbo Next, uncomment the line #uboot_overlay_pru=/lib/firmware/AM335X-PRU-UIO-00A0.dtbo so that it reads uboot_overlay_pru=/lib/firmware/AM335X-PRU-UIO-00A0.dtbo. Reboot. Then, type lsmod | grep pru Hopefully, you'll see: uio_pruss 16384 0 uio 20480 2 uio_pruss,uio_pdrv_genirq

2) There seems to be an issue with the onboard eMMC, perhaps because people are trying to burn too large an image onto it (although the image you create by STRICTLY following the guide should fit). Nevertheless, for now, you can force the BBBlue always to boot from the SD card, but we're going to do this by intentionally corrupting the boot sector of the eMMC. BACK UP ANY IMPORTANT DATA ON THE eMMC BEFORE DOING THIS sudo dd if=/dev/zero of=/dev/mmcblk1 bs=1M count=10

Hope these steps solve your problem.

-- Imf

hagfelsh commented 5 years ago

I caught your updates and nuked and paved my sd card. I followed every line with no departures or side trips. Still, my reward is:

beaglebone systemd[1]: Starting ArduPlane Service...
beaglebone systemd[1]: Started ArduPlane Service.
beaglebone kernel: [  119.797852] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f93000
beaglebone kernel: [  119.797861] pgd = (ptrval)
beaglebone kernel: [  119.797866] [b6f93000] *pgd=9c6c3831, *pte=4a324343, *ppte=4a324833
beaglebone arduplane[2175]: RCOutputAioPRU.cpp:SIGBUS error gernerated
beaglebone systemd[1]: arduplane.service: Main process exited, code=exited, status=1/FAILURE
beaglebone systemd[1]: arduplane.service: Unit entered failed state.
beaglebone systemd[1]: arduplane.service: Failed with result 'exit-code'.
beaglebone systemd[1]: arduplane.service: Service hold-off time over, scheduling restart.
beaglebone systemd[1]: Stopped ArduPlane Service.

Once again, it seems that the aphw script is struggling to do its work.

Let's look at permissions in /usr/bin/ardupilot; +x on UGO:

debian@beaglebone:/usr/bin/ardupilot$ ls -l
total 1236
-rwxr-xr-x 1 root root     256 Mar 30 03:12 aphw
-rwxr-xr-x 1 root root 1259140 Jan  4  2018 arduplane

The md5sum of arduplane, taken from http://bbbmini.org/download/blue/ArduPlane/arduplane-3_8_3: db615d369b86a37109b42802f6f1a93d

Contents of aphw:

debian@beaglebone:/usr/bin/ardupilot$ cat aphw
#!/bin/bash
#aphw
#ArduPilot hardware configuration.

/bin/echo 80 >/sys/class/gpio/export
/bin/echo out >/sys/class/gpio/gpio80/direction
/bin/echo 1 >/sys/class/gpio/gpio80/value
/bin/echo pruecapin_pu >/sys/devices/platform/ocp/ocp:P8_15_pinmux/state

And a comment-less dump of /boot/uEnv.txt:

uname_r=4.19.29-bone-rt-r29
dtb=am335x-boneblue.dtb

enable_uboot_overlays=1
dtb_overlay=/lib/firmware/BB-I2C1-00A0.dtbo
uboot_overlay_pru=/lib/firmware/AM335X-PRU-UIO-00A0.dtbo
enable_uboot_cape_universal=1

cmdline=coherent_pool=1M net.ifnames=0 quiet

Here's /lib/systemd/system/arduplane.service:

[Unit]
Description=ArduPlane Service
After=networking.service
StartLimitIntervalSec=0
Conflicts=arducopter.service ardurover.service antennatracker.service

[Service]
EnvironmentFile=/etc/default/ardupilot
ExecStartPre=/usr/bin/ardupilot/aphw
ExecStart=/usr/bin/ardupilot/arduplane $TELEM1 $TELEM2 $GPS

Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target

The contents of /etc/default/ardupilot:

debian@beaglebone:/usr/bin/ardupilot$ cat /etc/default/ardupilot
TELEM1="-C /dev/ttyO1"
TELEM2="-A udp:172.20.30.18:14550"
GPS="-B /dev/ttyS2"

I know that this guide is without warranty and I'll bet you have better things to do than run a support group on github, so if you've had enough of this chore, I'll totally understand. With that in mind, do you know of any leads I can chase down to try this from another direction? Your guide is by far the most straight forward... I'm baffled at how this isn't working. Or perhaps you might be interested in uploading an image of one of your successful installation operations? I can provide an sftp link privately, if so.

imfatant commented 5 years ago

It's OK. These things are sent to try us ;). What do you see when you type lsmod | grep pru?

hagfelsh commented 5 years ago

Thanks, you're certainly right about this being a trial of some sort! lsmod | grep pru returns nothing. What is the module called? Perhaps I can insmod or modprobe it (or whatever debian uses). I'd like to think the key offender is that aphw is denied execution. Since that's a prerequisite for the application, it stands to reason that's problem 0.

However.... If I run the lines by hand on the shell, they work, but apm still fails. Kinda muddies up the clarity on aphw a bit. The stack trace I posted from the previous iteration also agrees that it can't find or access GPIOs it needs... right? Here's that again for reference:

open("/sys/devices/ocp.3/pwm_test_P8_36.12/period", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/sys/devices/ocp.3/pwm_test_P8_36.12/duty", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/sys/devices/ocp.3/pwm_test_P8_36.12/run", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Does the board need to be populated with a minimum set of peripherals? The app appears to be testing for access to a PWM-based IO, which would likely be a receiver, I suppose. My current hardware config is bare--only the bone and nothing else. I have stuff, but I've just not added it yet.

imfatant commented 5 years ago

That you see nothing when lsmod'ing for the pru drivers is the problem.

A couple of posts up, I just realised that you said you'd nuked the SD card. Ah! No, you need to nuke the eMMC (which seems to cause a conflict with the pru drivers), not the SD card. Please nuke the eMMC as described and install/run off the SD card.

Let me know what happens. Fingers crossed.

hagfelsh commented 5 years ago

Alrighty!

I deleted the partition of /dev/mmcblk1 and rebooted. Still no success.

debian@beaglebone:~$ lsblk --fs
NAME         FSTYPE LABEL  UUID                                 MOUNTPOINT
mmcblk0
`-mmcblk0p1  ext4   rootfs b61d0e37-238c-402c-a29b-ac490c78aa0e /
mmcblk1
mmcblk1boot0
mmcblk1boot1

The internal storage wasn't mounted so I'm not sure how it could interfere with my runtime root partition, which is backed by the SD card. lsmod | grep -i pru still doesn't return any modules, either. Where does that module load come from? In fedora based distributions, I'd find hardware identified by the PCI subsystem and matched to a module by its PCI id. Here we have both not a server and definitely not RHEL, so I'm not sure where to look.

What do you think of this article?
http://www.ofitselfso.com/BeagleNotes/Enabling_the_UIO_Drivers_on_the_Beaglebone_Black.php

I noticed tonight that this version of the kernel is crashing the wireless driver... that's neat?

Here's systemctl status for ardupilot--note that aphw is exiting successfully, but journalctl still shows that aphw is encountering the same problems it always has.

debian@beaglebone:~$ systemctl status arduplane
● arduplane.service - ArduPlane Service
   Loaded: loaded (/lib/systemd/system/arduplane.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2019-04-01 03:16:15 UTC; 99ms ago
  Process: 1727 ExecStart=/usr/bin/ardupilot/arduplane $TELEM1 $TELEM2 $GPS (code=exited, status=1/FAILURE)
  Process: 1717 ExecStartPre=/usr/bin/ardupilot/aphw (code=exited, status=0/SUCCESS)
 Main PID: 1727 (code=exited, status=1/FAILURE)

Apr 01 03:16:15 beaglebone systemd[1]: arduplane.service: Unit entered failed state.
Apr 01 03:16:15 beaglebone systemd[1]: arduplane.service: Failed with result 'exit-code'.
Apr 01 03:30:43 beaglebone systemd[1]: Starting ArduPlane Service...
Apr 01 03:30:43 beaglebone aphw[991]: /bin/echo: write error: Operation not permitted
Apr 01 03:30:43 beaglebone aphw[991]: /usr/bin/ardupilot/aphw: line 6: /sys/class/gpio/gpio80/direction: No such file or directory
Apr 01 03:30:43 beaglebone aphw[991]: /usr/bin/ardupilot/aphw: line 7: /sys/class/gpio/gpio80/value: No such file or directory
Apr 01 03:30:43 beaglebone systemd[1]: Started ArduPlane Service.
Apr 01 03:30:43 beaglebone kernel: Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f1a000
Apr 01 03:30:43 beaglebone kernel: pgd = (ptrval)
Apr 01 03:30:43 beaglebone kernel: [b6f1a000] *pgd=9b232831, *pte=4a324343, *ppte=4a324833
Apr 01 03:30:43 beaglebone arduplane[999]: RCOutputAioPRU.cpp:SIGBUS error gernerated
Apr 01 03:30:43 beaglebone systemd[1]: arduplane.service: Main process exited, code=exited, status=1/FAILURE
Apr 01 03:30:43 beaglebone systemd[1]: arduplane.service: Unit entered failed state.
Apr 01 03:30:43 beaglebone systemd[1]: arduplane.service: Failed with result 'exit-code'.
imfatant commented 5 years ago

I read the article you linked to, and yes, we have uboot_overlay_pru=/lib/firmware/AM335X-PRU-UIO-00A0.dtbo in uEnv.txt already. This should (and does) work for me. One thing I could do is send you the bash script I made to automate the install, but I will need to tidy it first.

hagfelsh commented 5 years ago

It's certainly worth a shot. Consider it under NDA.

Could there be a difference in hardware revisions we're overlooking?

I'm going to dd my image and upload it to one of my servers. I'll figure out how to PM you the link in case you'd like to see this thing yourself.

imfatant commented 5 years ago

OK, I've tidied the setup script. Send me an email at imfatant@gmail.com so I know where to send it :)

nickanstey commented 5 years ago

I have prcisely the same problem as hagflesh. Have been struggling for two weeks now. Managed to get rid of aphw errors by putting the /bin/cat commands in the service definition. Iassume it works but the SIGBUS error persists and there are no uio entries from lsmod. Could I have the setup script to see if it makes a difference. Have copied this in e-mail to you. Hope you don't mind. Nick Anstey

nickanstey commented 5 years ago

I have had precisely the same problem with exactly the same revisions. After 2 weeks I seem to have stumbled on the solution. Try this:

sudo /opt/scripts/tools/developers/update_bootloader.sh

No errors and flashing red led. Courtesy poushtickcoder.wordpress.com

imfatant commented 5 years ago

Great stuff! Thanks for letting us know. I'll update the guide.

nickanstey commented 5 years ago

Back to square one. It worked once and only once. So whats going on with the PRUS. Could it be a timing problem at boot or maybe a power issue?

nickanstey commented 5 years ago

Actually the problem is not the same and I did do something. I set the drone type to "hexa" in QGrouncontrol. I note that although Ardupilot fails the PRUS overlay is loaded as evidenced by lsmod. Ardupilot actually fails with a SEGV fault now: "arducopter.service: Main process exited, code=killed, status=11/SEGV"

The previous error: kernel: Unhandled fault: external abort on non-linefetch (0x1818) has gone. Looks more like an application code issue.

nickanstey commented 5 years ago

Looks like the bootloader was an issue after all. The Ardupilot binary I'm using. I built on Ubuntu - Win 10 WSL. Is there a good prebuilt binary?

nickanstey commented 5 years ago

Downloaded bbbmini precompiled arducopter-3.5.4 and is seems to be a runner even after multiple reboots. QGroundControl seems to commuicate fine, so by all means include:

sudo /opt/scripts/tools/developers/update_bootloader.sh

in your setup instructions. The author of the article frames his fix as a resolution to the PRU UIO problem. Thanks for your help.

nickanstey commented 5 years ago

Oh, and by the way I think you're better off with an arducopter.service file that looks something like this: `[Unit] Description=ArduCopter Service After=networking.service StartLimitIntervalSec=0 Conflicts=arduplane.service ardurover.service antennatracker.service

[Service] EnvironmentFile=/etc/default/ardupilot

ExecStartPre=/usr/bin/ardupilot/aphw

Use the following three lines instead of aphw

ExecStartPre=/bin/echo 80 >/sys/class/gpio/export ExecStartPre=/bin/echo out >/sys/class/gpio/gpio80/direction ExecStartPre=/bin/echo 1 >/sys/class/gpio/gpio80/value

ExecStart=/usr/bin/ardupilot/arducopter $TELEM1 $TELEM2 $GPS

Restart=on-failure RestartSec=1

[Install] WantedBy=multi-user.target`

aphw didn't work for me although I haven't checked it since I got Ardupilot working. Cheers!

nickanstey commented 5 years ago

Oh, and by the way I think you're better off with an arducopter.service file that looks something like this: `[Unit] Description=ArduCopter Service After=networking.service StartLimitIntervalSec=0 Conflicts=arduplane.service ardurover.service antennatracker.service

[Service] EnvironmentFile=/etc/default/ardupilot

ExecStartPre=/usr/bin/ardupilot/aphw

Use the following three lines instead of aphw

ExecStartPre=/bin/echo 80 >/sys/class/gpio/export ExecStartPre=/bin/echo out >/sys/class/gpio/gpio80/direction ExecStartPre=/bin/echo 1 >/sys/class/gpio/gpio80/value

ExecStart=/usr/bin/ardupilot/arducopter $TELEM1 $TELEM2 $GPS

Restart=on-failure RestartSec=1

[Install] WantedBy=multi-user.target`

aphw didn't work for me although I haven't checked it since I got Ardupilot working. Cheers!

nickanstey commented 5 years ago

Another comment I made seems to have got lost. I used a prebuilt binary from BBBMini (3.5.4)and everything is peachy.

hagfelsh commented 5 years ago

For me this appears to have been solved by not only zapping the partition on the internal storage, but zeroing the first 2 MB of the blockdev as well. I can't wrap my head around how that's the case when the GPT was zeroed by sgdisk, but there it is. Thank you so much for persisting with me on this!

I seem to have evaded Nick's adventure with the bootloader somehow. But thank you for posting your findings; I'll bet I'll need them whenever I update ardupilot!