ophub / amlogic-s9xxx-armbian

Support for Armbian in Amlogic, Rockchip and Allwinner boxes. Support a311d, s922x, s905x3, s905x2, s912, s905d, s905x, s905w, s905, s905l, rk3588, rk3568, rk3399, rk3328, h6, etc.
GNU General Public License v2.0
6.13k stars 1.97k forks source link

电犀牛R66S使用本固件时有时无法识别第二个RTL8125网卡 #1061

Closed jysqice closed 1 year ago

jysqice commented 1 year ago

电犀牛R66s 2G版 Linux armbian 6.1.10-flippy-81+

问题如题目,出现问题时dmesg和lspci均没有0002:21:00.0的网卡信息,经过数千次重启测试,发现CPU温度低于40度时容易出现,CPU高于50度时完全不出现。使用“出厂固件”时无此问题

我的猜测:cpu的温控机制使得低温时运算速度过快,扫描pcie设备时未等到pcie设备响应即结束而导致了该问题出现 如果以上猜测属实,那么增加pcie扫描延时或者调整cpu温控机制均可以解决该问题

ophub commented 1 year ago

你指的第二个网卡,是外壳上标注为LAN的还是WAN的?

jysqice commented 1 year ago

WAN

ophub commented 1 year ago

我测试下

jysqice commented 1 year ago

想要使问题重现,可以用风扇对着吹

jysqice commented 1 year ago

2023-02-13 10:07:52 armbian booted with 1 RTL8125 and cpu temp is 37222 (error) 2023-02-13 10:08:46 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:09:34 armbian booted with 1 RTL8125 and cpu temp is 38333 (error) 2023-02-13 10:10:19 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:11:11 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:12:40 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:14:15 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:15:00 armbian booted with 1 RTL8125 and cpu temp is 36111 (error) 2023-02-13 10:15:48 armbian booted with 1 RTL8125 and cpu temp is 38333 (error) 2023-02-13 10:16:43 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:17:38 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:18:23 armbian booted with 1 RTL8125 and cpu temp is 35000 (error) 2023-02-13 10:19:14 armbian booted with 1 RTL8125 and cpu temp is 38888 (error) 2023-02-13 10:20:01 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:20:48 armbian booted with 1 RTL8125 and cpu temp is 37222 (error) 2023-02-13 10:21:38 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:22:28 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:24:00 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:24:47 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:25:42 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:26:30 armbian booted with 1 RTL8125 and cpu temp is 38888 (error) 2023-02-13 10:28:05 armbian booted with 2 RTL8125 and cpu temp is 40000 2023-02-13 10:28:54 armbian booted with 2 RTL8125 and cpu temp is 37222 2023-02-13 10:29:48 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:30:33 armbian booted with 2 RTL8125 and cpu temp is 36666 2023-02-13 10:31:23 armbian booted with 2 RTL8125 and cpu temp is 36666 2023-02-13 10:32:08 armbian booted with 1 RTL8125 and cpu temp is 36111 (error) 2023-02-13 10:32:43 armbian booted with 2 RTL8125 and cpu temp is 36666 2023-02-13 10:33:38 armbian booted with 1 RTL8125 and cpu temp is 40625 (error) 2023-02-13 10:34:22 armbian booted with 1 RTL8125 and cpu temp is 37222 (error) 2023-02-13 10:35:11 armbian booted with 2 RTL8125 and cpu temp is 34375 2023-02-13 10:36:06 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 10:36:54 armbian booted with 2 RTL8125 and cpu temp is 40000 2023-02-13 10:37:41 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:39:10 armbian booted with 2 RTL8125 and cpu temp is 36666 2023-02-13 10:39:59 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:40:47 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:41:37 armbian booted with 1 RTL8125 and cpu temp is 40000 (error) 2023-02-13 10:42:25 armbian booted with 2 RTL8125 and cpu temp is 35000 2023-02-13 10:43:14 armbian booted with 2 RTL8125 and cpu temp is 36111 2023-02-13 10:44:01 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:44:49 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:45:42 armbian booted with 2 RTL8125 and cpu temp is 33125 2023-02-13 10:46:33 armbian booted with 2 RTL8125 and cpu temp is 35000 2023-02-13 10:47:26 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:48:14 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:49:03 armbian booted with 2 RTL8125 and cpu temp is 37777 2023-02-13 10:49:59 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:51:32 armbian booted with 2 RTL8125 and cpu temp is 36666 2023-02-13 10:52:19 armbian booted with 1 RTL8125 and cpu temp is 36666 (error) 2023-02-13 10:53:09 armbian booted with 1 RTL8125 and cpu temp is 39444 (error) 2023-02-13 10:54:00 armbian booted with 2 RTL8125 and cpu temp is 40000 2023-02-13 10:54:47 armbian booted with 2 RTL8125 and cpu temp is 40000 2023-02-13 10:55:42 armbian booted with 1 RTL8125 and cpu temp is 39444 (error) 2023-02-13 10:56:34 armbian booted with 2 RTL8125 and cpu temp is 38333 2023-02-13 10:57:22 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:58:14 armbian booted with 2 RTL8125 and cpu temp is 38888 2023-02-13 10:59:01 armbian booted with 2 RTL8125 and cpu temp is 39444 2023-02-13 11:00:30 armbian booted with 1 RTL8125 and cpu temp is 37222 (error)

我用脚本生成的记录

jysqice commented 1 year ago

2023-02-13 11:15:23 armbian booted with 2 RTL8125 and cpu temp is 48333 2023-02-13 11:16:51 armbian booted with 2 RTL8125 and cpu temp is 45555 2023-02-13 11:17:41 armbian booted with 2 RTL8125 and cpu temp is 52500 2023-02-13 11:18:25 armbian booted with 2 RTL8125 and cpu temp is 52500 2023-02-13 11:19:15 armbian booted with 2 RTL8125 and cpu temp is 53125 2023-02-13 11:20:02 armbian booted with 2 RTL8125 and cpu temp is 53750 2023-02-13 11:23:03 armbian booted with 2 RTL8125 and cpu temp is 55555 2023-02-13 11:23:50 armbian booted with 2 RTL8125 and cpu temp is 53750 2023-02-13 11:24:37 armbian booted with 2 RTL8125 and cpu temp is 55555 2023-02-13 11:25:28 armbian booted with 2 RTL8125 and cpu temp is 59444 2023-02-13 11:27:01 armbian booted with 2 RTL8125 and cpu temp is 57777 2023-02-13 11:29:41 armbian booted with 2 RTL8125 and cpu temp is 46111 2023-02-13 11:30:32 armbian booted with 2 RTL8125 and cpu temp is 45555 2023-02-13 11:31:22 armbian booted with 2 RTL8125 and cpu temp is 41250

不用风扇基本正常

jysqice commented 1 year ago
#!/bin/bash
HOSTNAME=`hostname`
DATE="`date '+%Y-%m-%d %H:%M:%S'`"
NICNUM=`lspci|grep RTL8125|wc -l`
TEMP=`cat /sys/class/thermal/thermal_zone0/temp`
if [ ! -d /proc/scsi/usb-storage ];then
    if [ $NICNUM == "2" ];then
        echo "$DATE $HOSTNAME booted with $NICNUM RTL8125 and cpu temp is $TEMP">> /root/boot.log
        reboot
    else
        echo "$DATE $HOSTNAME booted with $NICNUM RTL8125 and cpu temp is $TEMP (error)" >> /root/boot.log
        reboot
    fi
fi
exit 0

放在rc.local里的自动测试脚本,如要中止测试插入一个U盘即可

ophub commented 1 year ago

Snip20230214_8

33度,大风扇吹一个晚上试试。

jysqice commented 1 year ago

测试需要反复重启,启动后网卡数量不会变,既不会减少也不会增加,缺失网卡情况下rescan仍然无效,不缺情况下再冷也是两个

kuaner commented 1 year ago

不确定与温度是否有关,我也有这个情况,openwrt固件,r68s,eth2不见了,通过ip link 看不到eth2

kuaner commented 1 year ago
截屏2023-02-14 21 10 23 截屏2023-02-14 21 10 58
kuaner commented 1 year ago

我是openwrt的固件,cpu的模式是schedutil,依然有这个问题,eth2不见了

jysqice commented 1 year ago

https://github.com/ayufan-rock64/linux-mainline-kernel/pull/18/commits/b5ce971dfbb3509821262d2587d986b8a192f6de 一个可能是类似问题的解决方案

kuaner commented 1 year ago
截屏2023-02-14 21 25 31

回退到79正常,80,81都有这个情况

ophub commented 1 year ago

Snip20230215_1

r66s放在风扇上吹了16个小时,一直保持在30多度,现在WAN口网络正常

jysqice commented 1 year ago

抱歉,是我没说清楚,这个问题的重点在于“启动时的识别”,如果启动时已经识别出两个网卡,后面是不会掉的,反之亦然,所以测试不是开着一晚上,而应该是一晚上反复重启,我一共重启了好几千次才来提交问题的

ophub commented 1 year ago

我的神,可别这么暴力测试,重启的r66s都晕圈了

jysqice commented 1 year ago

https://github.com/ayufan-rock64/linux-mainline-kernel/commit/b5ce971dfbb3509821262d2587d986b8a192f6de 能否尝试一下加入这个延时设置?

ophub commented 1 year ago

https://github.com/unifreq/linux-6.1.y

你看看f大的内核源码里加了没,没有的话你自己添加测试下是否编译正常,使用正常。测试好提交pr给他。

ihipop commented 1 year ago

我的神,可别这么暴力测试,重启的r66s都晕圈了

@ophub

我是同样的问题,偶尔重启会没一块网卡(但是只要开机检测到了 怎么低温都不会掉网卡),用了 @jysqice 的重启脚本测试了一下 发现低温(其实温度也不是很低,就是风扇对着吹而已)情况下这个问题会频繁出现。

测试了一下电犀牛官方固件没有这个问题。

开机掉网卡后,有尝试过pcie reset 无效。 reset的时候dmesg会出现这种信息。 image 每次如果出问题,都是 eth1掉。 只能重启解决。很恼

但是只要上电能出现网卡,不管温度多低,使用中都还算很稳定,不会掉。

kuaner commented 1 year ago

我重启也无法解决,昨日换回79的内核表现一直就正常了

ihipop commented 1 year ago

我是openwrt的固件,cpu的模式是schedutil,依然有这个问题,eth2不见了

你是 R68S 我是 R66S

kuaner commented 1 year ago

是的,掉网卡的问题,都存在。但我也有个66s,目前没遇到这个问题,用的最新的内核

ophub commented 1 year ago

f大可能了解79后内核更新了什么可能引起网卡丢失有关的补丁,问题我反馈给他了,等他看看什么原因。

ihipop commented 1 year ago

是的,掉网卡的问题,都存在。但我也有个66s,目前没遇到这个问题,用的最新的内核

机器热的时候,重启/上电启动基本上不会掉。 有问题的都是机器温度不高的时候。 温度不高也不是必掉。只是几率大很多。 所以用降温套装对着吹会增加出现的概率。

风扇+那个重启脚本能检测出来

ihipop commented 1 year ago

我对比了一下电犀牛的官方固件, 好像他们的dts里面还有处理低温的时候调整电压的设定 不知道是不是和这个有关系。

ophub commented 1 year ago

发一下你找到的温控设定的代码链接

ihipop commented 1 year ago

发一下你找到的温控设定的代码链接

rockchip4825-dts.tar.gz image

ophub commented 1 year ago

https://github.com/unifreq/linux-6.1.y/commit/1c9dcaf4c98622b869a33942f131093bac27a6d2

根据 @kuaner 的反馈,f大把在79(6.0.y)中添加的 rockchip-snps-pcie3 网卡补丁添加到了 6.1.y 源码里了,他重新打包了 6.1.11 的内核,我已经转存到了内核仓库 https://github.com/ophub/kernel/tree/main/pub/stable 请有问题的楼上的兄弟们测试下。

如果你的Armbian/OpenWrt系统已经是6.1.11内核,先更新为6.1.10,然后再更新6.1.11,因为不能同名更新。

Armbian 先同步下最新的脚本

armbian-sync

更新内核

armbian-update -k 6.1.10 自动重启后 armbian-update -k 6.1.11

OpenWrt 先更新下宝盒插件,这样会同步最新脚本

openwrt用户如果当前已经是6.1.11的,手动上传6.1.10内核到p4分区里,手动更新。重启后再更新回6.1.11

如果有耐心等待的也可以等6.1.12发布,可能今天f大会编译12内核。

jysqice commented 1 year ago

问题解决了,看样子是pcie3固件起的作用

ophub commented 1 year ago

6.1.12也更新了,更新到这个内核继续测试。 这个内核给amlogic也带来了emmc的惊喜修复。做为LTS内核,希望6.1越来越稳定。 6.2也马上来了,到时候如果f大忙的忘了这个补丁,再提醒下,看来很对症。

ihipop commented 1 year ago

6.1.12也更新了,更新到这个内核继续测试。 这个内核给amlogic也带来了emmc的惊喜修复。做为LTS内核,希望6.1越来越稳定。 6.2也马上来了,到时候如果f大忙的忘了这个补丁,再提醒下,看来很对症。

amlogic的EMMC修复了什么问题?

kuaner commented 1 year ago

遇到了几次通过amlogic宝盒更新固件失败,需要线刷的情况,请问也是跟这个amlogic的EMMC修复有关么

ophub commented 1 year ago

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-6.1.y

日志里有更新了什么的的介绍

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=a01ad536becb5d4c001a7d50dc1ca9fa14ef75a8

image

image

image

ophub commented 1 year ago

遇到了几次通过amlogic宝盒更新固件失败,需要线刷的情况,请问也是跟这个amlogic的EMMC修复有关么

在线下载更新的?还是手动上传更新的?固件如果通过web上传,就上传压缩包,如果解压了就通过scp上传,1G多浏览器上传会文件不完整。

无论是在armbian里还是在openwrt里,我每次做更新操作前,第一件事一定是先更新脚本。

armbian里使用armbian-sync可以把本机系统的脚本更新到最新 openwrt通过更新宝盒插件可以把全部脚本更新到最新

只要脚本是最新的,已知问题就都及时修复了,在过去的1年里,我做过几百次的更新,几乎4个系列的内核,每周发布新版我都会在armbian和openwrt分别更新一轮,我要检查这些文件是否发布到github仓库时是完整的,确保大家别下载到不完整的文件,我都分别一一验证过,包括固件我也不定期的安装,半个月至少会重新刷一遍固件,检查最近的固件有没问题,安装完openwrt都会接着做op固件更新。

kuaner commented 1 year ago

手动上传的压缩包,我看tg群也有哥们反馈这个情况

kuaner commented 1 year ago

也许也是因为刷了网卡掉了吧,我再多测试下

ophub commented 1 year ago

openwrt失败大多数是挂载点的错乱了,/dev/loop2p2 挂载失败。解决办法是重启下再试。

p3/p2没有挂载上,一般是由于自己修改挂载点引起的,也有个别设备是分区有问题,可以手动修复下,比如你在p2里更新时,说/dev/loop2p2挂载失败,这时是要使用p3,肯定是p3没挂载,或者分区有问题,

简单解决就是先手动挂载: mount /dev/mmcblk2p3 /mnt/mmcblk2p3 ,如果挂载上了就继续更新, 如果还是执行到挂载就失败,就把分区格式一下:mkfs.btrfs /dev/mmcblk2p3 -f 然后重启再更新固件肯定会成功了。

以上手动修复挂载或者更新分区,自己确认下,你当前系统在p2里,就处理p3,如果你在p3里就处理p2,使用 lsblk 可以看你根目录 / 挂载到了哪里

当然如果你上传的固件是不完整的,那怎么也不会成功,所以在线下载更新里有sha256sum验证,你手动上传要自己确认下文件是否完整。

kuaner commented 1 year ago

r68s刷机确认,修复了丢2.5g网卡的问题

ophub commented 1 year ago

好的,多谢反馈

ihipop commented 1 year ago

这个issue可以关啦