openwrt / luci

LuCI - OpenWrt Configuration Interface
Apache License 2.0
6.38k stars 2.53k forks source link

MTD device 'factory' is NAND, please consider using UBI block devices instead. #6092

Closed dhewg closed 1 year ago

dhewg commented 2 years ago

dmesg is spammed with the line from topic. It looks like a 10s interval with 3 retries.

It only happens when the web gui is opened in a browser. It doesn't seem to matter which page. When closing the web gui it stops spamming.

hnyman commented 2 years ago

Which device? Which OpenWrt and LuCI version?

dhewg commented 2 years ago

Sorry: OpenWrt/luci master as of yesterday (r21248). Netgear WAC124

$ cat /proc/mtd 
dev:    size   erasesize  name
mtd0: 00100000 00020000 "u-boot"
mtd1: 00100000 00020000 "SC PART_MAP"
mtd2: 00400000 00020000 "kernel"
mtd3: 02800000 00020000 "ubi"
mtd4: 01800000 00020000 "reserved0"
mtd5: 00200000 00020000 "factory"
mtd6: 03800000 00020000 "reserved1"
$ dmesg |grep -i mtd
[    1.931910] 7 fixed-partitions partitions found on MTD device mt7621-nand
[    1.945923] Creating 7 MTD partitions on "mt7621-nand":
[    3.630813] UBI: auto-attach mtd3
[    3.637519] ubi0: attaching mtd3
[    4.474737] ubi0: attached mtd3 (name "ubi", size 40 MiB)
hnyman commented 2 years ago

That "every 10 seconds" makes me to think about some status item refresh on the front page, but your comment that any page causes it, makes me wonder. Could it be something related to themes? (Something in header/footer?)

Any special/extra/nonstandard luci apps? Theme?

Has the message always shown, or did it start yesterday? (From which date was your previous build? Any changes in between?)

Ps. I do not see that with my R7800 and RT3200 builds from Friday, so it is either something quite new or something related to your build/packages.

dhewg commented 2 years ago

I've never seen it before, but yeah, it started yesterday with a new image.

I've bricked it just before that new image (messed up a netifd patch with a null pointer access, d'oh), so I recovered the device with nmrpflash, which flashed the vendor firmware. That might have been a newer version that repartitioned the nand - I'm not sure, but it looks the same to me. /proc/device-tree has the same nand layout as the commited dts, so it doesn't seem to be any bootloader update with some dts fixup either. Beats me...

It seems to happen on every page that dynamically refreshes its data or something? Like "Status/Overview" or "Network/Interfaces". It happens once when navigating to Network/Diagnostics, but then without the 10s interval. But "Status/System Log" doesn't give those log lines, not even when navigating there.

No fancy theme or anything out of the ordinary:

$ grep \^CONFIG_PACKAGE_luci- .config
CONFIG_PACKAGE_luci-ssl-openssl=y
CONFIG_PACKAGE_luci-base=y
CONFIG_PACKAGE_luci-compat=y
CONFIG_PACKAGE_luci-lua-runtime=y
CONFIG_PACKAGE_luci-mod-admin-full=y
CONFIG_PACKAGE_luci-mod-network=y
CONFIG_PACKAGE_luci-mod-status=y
CONFIG_PACKAGE_luci-mod-system=y
CONFIG_PACKAGE_luci-app-firewall=y
CONFIG_PACKAGE_luci-app-ledtrig-switch=y
CONFIG_PACKAGE_luci-app-ledtrig-usbport=y
CONFIG_PACKAGE_luci-app-opkg=y
CONFIG_PACKAGE_luci-app-uhttpd=y
CONFIG_PACKAGE_luci-theme-bootstrap=y
CONFIG_PACKAGE_luci-proto-ipv6=y
CONFIG_PACKAGE_luci-proto-ppp=y
CONFIG_PACKAGE_luci-lib-base=y
CONFIG_PACKAGE_luci-lib-ip=y
CONFIG_PACKAGE_luci-lib-jsonc=y
CONFIG_PACKAGE_luci-lib-nixio=y
hnyman commented 2 years ago

The string MTD device 'factory' is NAND, please consider using UBI block devices instead comes from upstream Linux,

It comes from these commits in 5.15 (which are not in 5.10):

The latter commit is signed off by @bmork who might have some guess why it surfaces now.

The curious part is that based on git time stamps those commits have been in Linux 5.15 since June, so some apparently recent change in OpenWrt/LuCI is now triggering that new message for you.

I just compiled a new image for RT3200 and I do not see that error. (21272-b97e5ac785 / LuCI Master git-22.314.85334-0141732)

happen on every page that dynamically refreshes its data or something? Like "Status/Overview" or "Network/Interfaces". It happens once when navigating to Network/Diagnostics, but then without the 10s interval. But "Status/System Log" doesn't give those log lines, not even when navigating there.

the normal refresh is 5 seconds on most pages (like "Status/Overview" and "Network/Interfaces"), so also that 10 seconds (instad of 5) looks strange. (unless you have modified that refresh time)

dhewg commented 2 years ago

Yeah, I was just looking at those upstream commits too. It only happens on mtdblock access.

It all looks harmless, but it raises the question what luci does every X seconds on mtdblock devices and if that is really necessary?

I just rebooted the device. Without luci access those warnings do not occur. I may have just not noticed those before...

Example. Rebooted, logged in and just let the status page open and sit there:

Sun Nov 13 08:20:35 2022 daemon.err uhttpd[1737]: [info] luci: accepted login on / for root from 192.168.0.45                                                   
Sun Nov 13 08:20:37 2022 kern.warn kernel: [  970.953615] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:38 2022 kern.warn kernel: [  971.654482] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:39 2022 kern.warn kernel: [  972.358716] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:40 2022 kern.warn kernel: [  973.073497] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:40 2022 kern.warn kernel: [  973.774744] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:41 2022 kern.warn kernel: [  974.477342] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:43 2022 kern.warn kernel: [  976.633284] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.              
Sun Nov 13 08:20:44 2022 kern.warn kernel: [  977.333678] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:45 2022 kern.warn kernel: [  978.035717] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:45 2022 kern.warn kernel: [  978.748227] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:46 2022 kern.warn kernel: [  979.447379] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:47 2022 kern.warn kernel: [  980.150164] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:53 2022 kern.warn kernel: [  986.643582] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:54 2022 kern.warn kernel: [  987.344008] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:55 2022 kern.warn kernel: [  988.081289] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:55 2022 kern.warn kernel: [  988.830938] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:56 2022 kern.warn kernel: [  989.529383] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:20:57 2022 kern.warn kernel: [  990.233643] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:03 2022 kern.warn kernel: [  996.634412] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:04 2022 kern.warn kernel: [  997.336462] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:05 2022 kern.warn kernel: [  998.038479] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:05 2022 kern.warn kernel: [  998.751119] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:06 2022 kern.warn kernel: [  999.452387] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:07 2022 kern.warn kernel: [ 1000.155659] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:13 2022 kern.warn kernel: [ 1006.655502] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:14 2022 kern.warn kernel: [ 1007.357233] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:15 2022 kern.warn kernel: [ 1008.063797] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:15 2022 kern.warn kernel: [ 1008.782076] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:16 2022 kern.warn kernel: [ 1009.486327] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.
Sun Nov 13 08:21:17 2022 kern.warn kernel: [ 1010.192164] mtdblock: MTD device 'factory' is NAND, please consider using UBI block devices instead.

Now its 6 retries? It was 3 before but I haven't changed anything, just rebooted...

hnyman commented 2 years ago

First guess is something like "disk space" or "mount points" status item, but it is hard to figure out why that would then surface also on the "Network/Interfaces" page like you said.

Next guess is something related to header status items like "running initramfs", "no password set". But again, why that would surface for your router? and now?

Your device is ramips mt7621, right? Just thinking if something has recently changed regarding handling the partitions there.

(Ps. Funny, while looking for mt7621 commits, I actually spotted a bug from yesterday in https://github.com/openwrt/openwrt/pull/10397 )

dhewg commented 2 years ago

Also, it's just the "factory" partition that gets logged about, not any of the other non-ubi ones. And the dts contains references to that like nvmem-cells = <&macaddr_factory_4>; Maybe all that get triggered because of something something macs in luci?

hnyman commented 2 years ago

Maybe all that get triggered because of something something macs in luci?

LuCI pages in general have no interest on MACs.

It seems to happen on every page that dynamically refreshes its data or something? Like "Status/Overview" or "Network/Interfaces". It happens once when navigating to Network/Diagnostics, but then without the 10s interval. But "Status/System Log" doesn't give those log lines, not even when navigating there.

Based on that, my two cents are on either "running initramfs" or "no password set" status evaluation on header on every page. But the claim that "Status/System Log doesn't give those log lines" again contradicts that, as also those pages have the same header, AFAIK, and it should materialize also there. Can you try to spot differences between it occuring on various pages?

Similarly, I have no idea why you see it and I do not. (mtd partition name "factory" should not cause that, or should it?) And why it surfaces now? (you would likely have spotted that earlier. So prominent...)

dhewg commented 2 years ago

I can reproduce without browser via ubus call luci-rpc getWirelessDevices

dhewg commented 2 years ago

which contains: ubus call network.wireless status, fine, no warning some iwinfo stuff, and running iwinfo from a shell yields those warnings too

dhewg commented 2 years ago

And we have a winner: https://git.openwrt.org/?p=project/iwinfo.git;a=blob;f=iwinfo_utils.c;h=c15b30517235ed0b34997922141a2b3467352cf7;hb=HEAD#l175

It's trying to identify the hardware. There's a usb wifi adapter attached. And iwinfo is trying to get details about it from mtd. Which doesn't make any sense for usb devices heh

Unplugging the usb device makes the warnings go away

hnyman commented 2 years ago

Nice find.

But then the baffling part is, why that error materialises on the LuCI pages quite unrelated to WiFi stuff.

bmork commented 2 years ago

Yes, that looks like the trigger. Obviously a false positive with that O_RDONLY, but I don't know what we can do about it.

Wrt the patch with my name on it, that was a compromise to reduce the log spam from "extremely annoying" to "tolerable". It was the best I managed to get in.

Personally I'd rather remove the warning as it will always have false positives. But it was hard enough to convince them to drop the repeated warning for every partition whether accessed through the mtdblock driver or not. There was some underlying assumption that it is wrong to have that driver at all on a system with NAND flash. Which completely fails to see the OpenWrt, or any other distro, use case.

So the warning will still trigger a false positive on totally reasonable accesses like for example:

root@OpenWrt:/# hexdump -n 20 -C /dev/mtdblock1
[235410.976616] mtdblock: MTD device 'Config' is NAND, please consider using UBI block devices instead.
00000000  b3 80 dc d8 62 6f 6f 74  63 6d 64 3d 63 70 2e 62  |....bootcmd=cp.b|
00000010  20 30 78 42                                       | 0xB|
00000014

and this is what you hit in that iwinfo code. This is expected, unfortunately.

I'll support any attempt to get the warning completely dropped from mainline, but I'm not going to take that discussion myself again.

Alternatively, we could patch the warning out of OpenWrt. But I'm not sure it's a good idea to add more kernel maintenance work for something like this? Which is why I took the mainline route with my initial attempt to silence it.

Don't know how feasible it to just change the triggering code in this case? It looks like it might just as well use "/dev/mtd%dro" instead of "/dev/mtdblock%d", an as such drop the dependency on the mtdblock driver too, but I really don't know anything about these flash things and the finer details involved...

dhewg commented 2 years ago

Here's a patchset for iwinfo that improves the situation with usb wifi adapters: https://patchwork.ozlabs.org/project/openwrt/list/?series=327961

It doesn't fix the mtd access, but just because I don't have any device where that would be used, so I cannot test such changes.

But with that, it skips the ftd and mtd lookups for usb devices, hence fixing warning spam, the rather huge delay and on top allows to identify usb devices via devices.txt. </yak>

jow- commented 1 year ago

Should be fixed by https://github.com/openwrt/openwrt/commit/4a4d0bf78ddbbf17508891c5c837e5eb00420b5c