pyavitz / debian-image-builder

Debian image builder for single board computers
Other
141 stars 33 forks source link

NanoPi R5S (general discussion) #56

Closed phlax closed 1 year ago

phlax commented 1 year ago

I have been trying to build an image for the r5s using your build system.

Everything works great with the exception of nvme

Depending on which kernel i use the error message/s can be different - some related ones seem to be:

pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22
nvme 0002:01:00.0: of_irq_parse_pci: failed with rc=134
nvme 0002:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

i know the drive/fitting is ok - i can see it with the friendlywrt kernel - but any mainline kernel i have tried to build always ends with similar issues

i have tried with 5.19, 6.0, 6.1 and various recent commits from the kernel tree - and with both v2022.07/v2022.10 versions of uboot

phlax commented 1 year ago

relatedly there was some discussion about this (before it was shutdown for being OT) here https://github.com/armbian/build/pull/4247#issuecomment-1289388201

there are various statements that both this system and an armbian build should work but ive yet to see either an image or build recipe that does

pyavitz commented 1 year ago

This appears to be a hit or miss issue. For some people it works and others it doesn't. Without any proper data I can't be sure. My guess would be it's either power related or the actual drive type is the problem.

Some people are using Dietpi, which I believe is using vendor u-boot / linux. So that's always an option. At one point I did make an attempt at using the vendor kernel with this current version of u-boot, but I was told it didn't properly boot.

phlax commented 1 year ago

using vendor u-boot / linux. So that's always an option.

it was in part trying to escape the vendor build that led me here

mostly because i wanted some control over the kernel, but also with the vendor kernel performance of the 2.5Gb ports seems really poor - upgrading the kernel seems to improve (significantly if not v satisfactorily)

pyavitz commented 1 year ago

it was in part trying to escape the vendor build that led me here

Understandable.

Unfortunately, I don't have the unit, so I'm unable to run tests or do any debugging. You can leave the issue open and see if anyone has any input or suggestions on the subject. If I should come across a solution, I'll be sure to drop it here.

phlax commented 1 year ago

seems to work with pcie_aspm=off

not sure if that is desirable for everyone on this board - but if its helpfiul to add somewhere i would be happy to PR

pyavitz commented 1 year ago

Of course please do.

There is a command line section in the boards file; https://github.com/pyavitz/debian-image-builder/blob/feature/lib/boards/nanopir5s#L43

Add any additions to the EXTRA variable

electrified commented 1 year ago

Hi @phlax! Off topic, how are you measuring the ethernet performance?

I am seeing the opposite - with the FE image rk3568-sd-friendlycore-focal-5.10-arm64-20221206.img.gz I can recreate their iperf3 scores (~2.3 Gbps with iperf3 -c remote -t 60 -i 5 -P4) whereas with this image I get ~1.8Gbps.

I was putting this down to the official image using the out-of-tree realtek driver and was going to try that on this image.

phlax commented 1 year ago

here are the results i got with the friendlywrt image (21.02 booted from sd) - https://www.friendlyelec.com/Forum/viewtopic.php?f=77&t=3905&sid=056efd465124dca4febf4bb967307cdf

i didnt test the ubuntu image but did test debian

looking over the results (and comparing to what i have now with image built from this repo) you might be right wrt iperf.

im seeing 2.3 -> r5s and 1.8 <- r5s

otoh, i was seeing really poor performance scping to/from (a tmpfs) file - i think the en/decryption is the bottleneck so may be unrelated to network performance - but in that case i have seen ~50% speed up - both up/down with scp - since moving off the vendor kernel

i have come to the conclusion that these devices are a bit of a gimmick - they just dont seem to have the power for 1 2.5Gbe port, never mind 2.

what might work better - which i am about to test - is passthrough - ie only routing through the r5s. If that is the case all is not lost with these i think

my hope had been to use it as a tftp server and/or squid caching proxy (for dockerhub/apt/etc) hence my interest in the performance of getting data from disk onto the network - im not sure how well the r5s will perform for that, but still testing it

I was putting this down to the official image using the out-of-tree realtek driver and was going to try that on this image.

hmm - it would be good to know if this helps at all

electrified commented 1 year ago

i have come to the conclusion that these devices are a bit of a gimmick - they just dont seem to have the power for 1 2.5Gbe port, never mind 2.

Agreed - I bought mine and before it shipped the R6S was released 😅 - the RK3588 in that is a lot more capable, but it doesn't have the m2 slot so no use for your purpose.

hmm - it would be good to know if this helps at all

If I can get it working I'll report back here.

phlax commented 1 year ago

thanks @pyavitz

Agreed - I bought mine and before it shipped the R6S was released

yep - same

@electrified i have tested this further with my built kernel (~current mainline HEAD with ~6.1 patches)

im seeing a pretty reliable ~2.3 in both directions, altho not sustained particularly with egress traffic

repeated or prolonged testing (>~30s) seems to starve something and the egress performance can drop to ~1.8 or lower

there are errors that tend to occur altho not at exactly the time the speed drops - like

perf: interrupt took too long (4916 > 4915), lowering kernel.perf_event_max_sample_rate to 40000

repeated testing can push the test speeds down to ~1.4 but then something seems to give and speed returns to ~2.3

similar can happen the other way but it seems to recover much faster

phlax commented 1 year ago

re routing iperf between subnets - pretty much the same picture just a little slower

not sure its apropos of much

not sure i would want to use this for routing - connecting my 2 end machines directly in a switch gives actual bidir full duplex speed, reliably

electrified commented 1 year ago

I experimented with the Realtek r8125-9.010.01 driver. It's buildable on kernel 6.1.0 with a 1 line change.

Unfortunately I didn't see any massive changes in performance so will continue to use the in-tree driver.

I'm now seeing results in line with what you are seeing - I had some firewall rules that were impacting performance.

jwischka commented 1 year ago

Has anyone gotten a wireless pcie card to work on these? I've tried both a MT7921 and RTL 8822 and no joy either way. It sort of looks like it can see the mediatek (complains about not being able to find firmware), but when you copy the firmware in it goes awol. Any ideas?

phlax commented 1 year ago

Has anyone gotten a wireless pcie card to work on these?

not a pcie card - i have managed to get a 8814au usb wifi dongle ~working - altho it does the same thing as when connecting a 2..5Gbe dongle - it works until you pull data across the interface - then crashes

this seems to be the same story with the odroid m1 - so im thinking that this chipset is just not sufficiently powered for its peripherals

im struggling to think of any remaining uses for this box (other than slow 2 port router)

jwischka commented 1 year ago

I managed to get the MT7921 working on the 6.1 kernel, but data rates were slow. An MT7612 USB adapter crashes promptly, even though the drivers and firmware are there. An MT7610 USB adapter works, but again slow data rates.

It's weird because it feels like it's right on the edge of working, but falls just short. Isn't there official support for the chipset in the 6.X kernels?

phlax commented 1 year ago

not sure - there is certainly activity - but i think most of the patches are still required to build (for dtbs in particular)

i built 6.2-rc1 and it didnt help

phlax commented 1 year ago

An MT7612 USB adapter crashes promptly, even though the drivers and firmware are there.

i think its a buggy host controller - by adding the fix here https://github.com/phlax/linux/commit/ee411a64f21a1b9ac5b878f646aca851e878f1b6 i managed to fix similar problems for the odroid m1 - the patch makes 2.5gbe adapter work (altho no faster than onboard 1G)

i have just tried again with the the 8814au dongle - with some combination of reinserting the driver/usb/rebooting/swearing i managed to get the interface working more reliably and to achieve speeds ~300Mbps (before falling over 8/)

im going to test it with the xhci hack and see if that helps - i suspect its related - at one point i was consistently seeing 300+ in one direction, but anything involving egress from wifi on the box was limited to 20Mbps - like this it was reliable, not falling over. Rebooting got speeds up to ~300 both ways, before it did fall over

jwischka commented 1 year ago

An MT7612 USB adapter crashes promptly, even though the drivers and firmware are there.

i think its a buggy host controller - by adding the fix here phlax/linux@ee411a6 i managed to fix similar problems for the odroid m1 - the patch makes 2.5gbe adapter work (altho no faster than onboard 1G)

i have just tried again with the the 8814au dongle - with some combination of reinserting the driver/usb/rebooting/swearing i managed to get the interface working more reliably and to achieve speeds ~300Mbps (before falling over 8/)

im going to test it with the xhci hack and see if that helps - i suspect its related - at one point i was consistently seeing 300+ in one direction, but anything involving egress from wifi on the box was limited to 20Mbps - like this it was reliable, not falling over. Rebooting got speeds up to ~300 both ways, before it did fall over

Sounds promising. Any chance to integrate this into a pull request for this repo?

phlax commented 1 year ago

sure - let me test it first - either way i think its worth adding for the m1

jwischka commented 1 year ago

Sounds good. I do think you're right about the root cause though - it's a pretty reliable crash on the USB bus, whereas I've gotten the PCIe up and running now, which seems to be pretty solid.

phlax commented 1 year ago

great, the good news is that with the patch it stops failing

it still displays a similar behaviour where performance comes and goes in waves but it doesnt seem to fall over at least in my limited testing

it seems to have a speed constraint (im guessing communicating with the controller) of ~460Mbps - it sustains that for a few cycles and then craps out, at least now it recovers without dropping too much

almost certainly this will also allow this unit to work with an external 2.5Gbps dongle - but i dont expect it will perform any better than any of the onboard ethernet - ill test this tomorrow

ill raise a pr now with the nanopi patch

phlax commented 1 year ago

pr #60 sent over newly working wifi interface 8)

jwischka commented 1 year ago

Lovely. For the moment I'd be happy with no kernel panics!

electrified commented 1 year ago

Thanks for your work on this.

Offtopic, I wondered if the FriendlyElec supplied 5.10.110 kernel with out-of-tree Rockchip drivers performs any better than mainline, and attempted to get my USB wifi device working under that. (Only as an experiment - no desire to run it long term!)

Unfortunately I had no joy building the wifi driver to build on their debian-bullseye image. (FriendlyElec's rtl8822 driver is missing the USB VID/PID for my device)

Building against their supplied kernel-headers, upon inserting the module I get

88x2bu: disagrees about version of symbol module_layout

Suggesting their header package doesn't actually match the supplied kernel?

Forcing the load I not too surprisingly still get errors

[ 4109.106649] 88x2bu: Unknown symbol _mcount (err -2)
[ 4109.106889] 88x2bu: Unknown symbol __stack_chk_guard (err -2)

If I ever get a working driver on that kernel I will try some comparisons..

phlax commented 1 year ago

I wondered if the FriendlyElec supplied 5.10.110 kernel with out-of-tree Rockchip drivers performs any better

iirc it suffers from the same xhci problems wrt to usb dongles - i didnt see great performance out of the onboard 2.5Gbps ports either (altho stable)

also ~OT my vague plan had been to swap out my r5s as a router with an n2+ as that had performed better in most of the network tests i had run - for my wifi dongle it is more stable than the r5s but performance is actually significantly lower

it would be really good to know if the r6s fairs any better for 2.5G networking and xhci devices