thor2002ro / unraid_kernel

Kernel repository for UNRAID(unofficial)
116 stars 11 forks source link

Network Error #22

Closed PebThePebble closed 1 month ago

PebThePebble commented 7 months ago

I tried to update to 6.12.8 with your kernal to allow my A380 to work on linux it works perfectly fine on 6.12.6 i updated the unraid OS added your kernal for 6.12.8 and suddenly it kept refusing to connect to the network to the point i had to roll back to the 6.12.6 one just to have a IP be able to be assigned it works perfectly fine on the unraid vanilla 6.12.8 with the network but then i dont have support for the intelA380 which is why i need the kernal change to allow support for it. Its only once i add your latest 6.12.8 update that i cant get it working at all.

freehelpdesk commented 7 months ago

Same here, My system is unable to obtain an IP on the latest Release Candidate (expected).

thor2002ro commented 7 months ago

careful there are 2 6.8rc5 releases one for 6.12.6 and one for 6.12.8 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240220 I dont have and arc so cant comment about that ... but network works fine here on 6.12.8

PebThePebble commented 7 months ago

Yes I was using the correct version the one that released 2 days ago the card showed up I was able to do "ls /dev/dri" and see the card it would just however always refuse to connect to the network I could still login and use unraid in the terminal which is how I was able to check the GPU showed. But the second i remove your kernel change and go to vanilla unraid Linux the card no longer works as their Linux doesn't support Intel gpus but it does then obtain a network connection and works at 100%

PebThePebble commented 7 months ago

The second I put my unraid back to 6.12.6 and installed your kernel for that version it all worked perfectly fine again

confuzedplayer commented 7 months ago

https://github.com/thor2002ro/unraid_kernel/issues/21

Yeah i have same issues.

thor2002ro commented 7 months ago

I could try compiling the 6.7 for 6.12.8 to see if its fine.... everything works fine here.... maybe try to dump dmesg when network doesnt work..... "dmesg > /boot/dmesg.txt" should do the trick

jaimbo commented 7 months ago

Also jumping in to say I had similar strange network issues with 6.12.8 - I run a 2 port mellanox NIC and run br1 on the second port for a few containers. For some reason using the 6.12.8 kernel release caused loads of issues with my containers on custom network on br1. Took ages for me to figure out it was the kernel causing it and everything worked fine after rolling back to the stock kernel!

confuzedplayer commented 7 months ago

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

freehelpdesk commented 7 months ago

Not sure if this is where the problem is happening,

image

thor2002ro commented 7 months ago

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

here is 6.7.5 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240223

Not sure if this is where the problem is happening,

image

dump dmesg to usb boot "dmesg > /boot/dmesg.txt" should do the trick

kaaresgut commented 7 months ago

Just thought I'd throw my info in here too. I don't have network either. Unraid 6.12.6 - last kernel to work was 6.7.0. Tried 6.7.2, 6.7.3 and a few of the 6.8b Unraid 6.12.8 - last kernel to work was 6.7.0. Tried 6.7.5 and a few of the 6.8.b. Haven't tried the last beta one compiled for unraid 6.12.8 since 6.7.5 didn't work. Seems like the problem happened after 6.7.0, and upgrading to the new unraid did nothing to change things for me. Going back to kernel 6.7.0 fixes it on both versions of unraid.

I assume it is because I also use bridge mode. I use a Broadcom BCM57416 chipset though.

thor2002ro commented 7 months ago

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

kaaresgut commented 7 months ago

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

dmsg 6.7.5 dmsg 6.7.0

Looks like br0 is left in disabled state on 6.7.5, but in 6.7.0 it enters forwarding state.

confuzedplayer commented 7 months ago

I could try compiling the 6.7 for 6.12.8 to see if its fine

if you could please, i would like to give that a try

here is 6.7.5 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240223

Not sure if this is where the problem is happening, image

dump dmesg to usb boot "dmesg > /boot/dmesg.txt" should do the trick

I got it boot up with this but unable to get ARC working

thor2002ro commented 7 months ago

I use bridge also.... with BCM5720 if you would post logs maybe we could figure out....

dmsg 6.7.5 dmsg 6.7.0

Looks like br0 is left in disabled state on 6.7.5, but in 6.7.0 it enters forwarding state.

curios do you have both bonding and bridging enabled? try only using bridging or only bonding.... from the kernel point of view everything looks fine.... its just a configuration issue somewhere....

Edit:yah tested it in a vm unraid and enabling bonding craps the network.... I'm only using bridging that's why its fine here.... Interesting.... maybe the bonding interface changed some defaults....

freehelpdesk commented 7 months ago

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

confuzedplayer commented 7 months ago

Sorry, would u be able to compile one for 6.12.8 using 6.70?

thor2002ro commented 7 months ago

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

could be the compiler ... but not likely.... I will try some variations...

Sorry, would u be able to compile one for 6.12.8 using 6.70?

I should be able to... since I have git saved...

confuzedplayer commented 7 months ago

One thing i did notice from 6.7.0 to > 6.7.0 is the compiler switch, I'm not sure if that can be screwing something up deep down in the source tree. Just a thought maybe worth testing.

could be the compiler ... but not likely.... I will try some variations...

Sorry, would u be able to compile one for 6.12.8 using 6.70?

I should be able to... since I have git saved...

Thank you

thor2002ro commented 7 months ago

here are versions of 6.7 , 6.7.5 , 6.8rc5 built with clang19 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240224 didn't have time to test them tho.....

confuzedplayer commented 7 months ago

Thank you, ill test a little later.

kaaresgut commented 7 months ago

I tried the 6.7.5 clang version (dmesg); the compiler doesn't seem to be the reason. Still no network.

I then tried the 6.7.0 clang version (dmesg). It works.

About having both bonding and bridge... not sure why it's like that. I've never touched the network part of unraid. It worked upon my first install and I just never touched it.

kaaresgut commented 7 months ago

Was poking around the unraid forums and stumbled upon this. Someone was trying your 6.7.3 kernel and couldn't get network. He managed to fix it by running some commands on each boot. And then the more interesting bit in the comments:

Unraid 6.12.8 has an earlier kernel (point) version, because later kernels have a modification which breaks bonding. For future Unraid versions we made a modification to support bonding on latest kernel versions.

I wonder what the modification could be.

thor2002ro commented 7 months ago

had some time today and played with it a little the issue seams to be the /etc/rc.d/rc.inet1 in bond_up

run ip link set $BONDIF master ${BONDNAME[$i]} up type bond_slave

doesn't work anymore changing it to

run ip link set $BONDIF master ${BONDNAME[$i]} type bond_slave
run ip link set $BONDIF up

gets everything working....

thor2002ro commented 7 months ago

made a patch package for unraid fixing bonding....

just make a packages dir in the root of the usb stick in the usb/config/go file add

upgradepkg --install-new /boot/packages/*.TGZ before "Start the Management Utility"

unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ

PS: don't use extra directory because ur lazy it runs to early in the booting cycle and booting will freeze

confuzedplayer commented 7 months ago

I was able to have bonding and bridging mode enabled and arc working on 6.70

Thank you for the complies

kaaresgut commented 7 months ago

I installed the package and tried 6.8b5. Not working. Then i tried the command in the terminal instead and it gave me an unsupported extension error. Huh? I noticed that the package had the extension in all caps so i changed it to lower caps. That worked. After changing the filename and the line in the go file from TGZ to tgz it worked.

Now I'm booting and running 6.8.b5 just fine.

thor2002ro commented 7 months ago

glad it worked

thor2002ro commented 6 months ago

fixed this in 6.7.9 :) was an intentional breakage....

jaimbo commented 6 months ago

fixed this in 6.7.9 :) was an intentional breakage....

Is the unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ file still required?

I just tried installing the 6.7.9 release and still having the same issue with my custom br1 docker network not being able to access containers on the host.

For more context, I run NginxProxyManager as a reverse proxy and I run this on port 80/443 on a second NIC port/br1. This allows me to leave the Unraid Web Interface on the primary NIC with default ports and use NPM on a second IP also with default ports.

When running your kernel, NPM can no longer proxy to other containers on the unraid host, which does work with the stock kernel. (I am able to access the web interface for NPM and proxies to other clients on my home network are also working)

I have tried disabling and re-enabling "Host access to custom networks" which doesn't get things working again.

Any ideas on why this isn't working?

kaaresgut commented 6 months ago

I disabled the patch when I installed 6.7.9 and I got an IP address. I'm using the clang build and not the latest gcc one if that matters.

thor2002ro commented 6 months ago

fixed this in 6.7.9 :) was an intentional breakage....

Is the unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ file still required?

I just tried installing the 6.7.9 release and still having the same issue with my custom br1 docker network not being able to access containers on the host.

For more context, I run NginxProxyManager as a reverse proxy and I run this on port 80/443 on a second NIC port/br1. This allows me to leave the Unraid Web Interface on the primary NIC with default ports and use NPM on a second IP also with default ports.

When running your kernel, NPM can no longer proxy to other containers on the unraid host, which does work with the stock kernel. (I am able to access the web interface for NPM and proxies to other clients on my home network are also working)

I have tried disabling and re-enabling "Host access to custom networks" which doesn't get things working again.

Any ideas on why this isn't working?

disable the earlier patch..... I fixed the breakage in the kernel

jaimbo commented 6 months ago

fixed this in 6.7.9 :) was an intentional breakage....

Is the unraid_fix-bond_6.12.8-2024.02.25-x86_64-thor.TGZ file still required? I just tried installing the 6.7.9 release and still having the same issue with my custom br1 docker network not being able to access containers on the host. For more context, I run NginxProxyManager as a reverse proxy and I run this on port 80/443 on a second NIC port/br1. This allows me to leave the Unraid Web Interface on the primary NIC with default ports and use NPM on a second IP also with default ports. When running your kernel, NPM can no longer proxy to other containers on the unraid host, which does work with the stock kernel. (I am able to access the web interface for NPM and proxies to other clients on my home network are also working) I have tried disabling and re-enabling "Host access to custom networks" which doesn't get things working again. Any ideas on why this isn't working?

disable the earlier patch..... I fixed the breakage in the kernel

I can confirm I don't have the patch installed and still have the above described issue...

blaze756 commented 6 months ago

@thor2002ro installed 6.7.9 and network the interfaces are working correctly.

jaimbo commented 6 months ago

Interestingly, this is also an issue with the Unraid 6.13.0-beta.1 release which is using 6.6.18

thor2002ro commented 6 months ago

the kernel change is from 6.7

jaimbo commented 6 months ago

Thinking about it, the issue in the 6.13 beta appears to be different as containers using the Custom docker network on br1 failed to start entirely.

It appears the custom docker network is completely missing, whereas with 6.12 and your kernel, the custom network was present but unable to communicate with the host network

thor2002ro commented 6 months ago

as far as I know the containers can NOT ping the underlying host interfaces as

they are intentionally filtered by Linux for additional isolation....

jaimbo commented 6 months ago

This works on 6.12.8 with the stock kernel and <=6.12.7 with older versions of your kernel. There is a setting in the Docker settings to facilitate this:

image

nberlee commented 6 months ago

@thor2002ro I think you may revert your revert for 6.12.9:

this is the bond up in rc.inet1

      run ip link set $BONDIF up
      run ip link set $BONDIF master ${BONDNAME[$i]} down type bond_slave
thor2002ro commented 6 months ago

no still bad..... needs to be

run ip link set $BONDIF down
run ip link set $BONDIF master ${BONDNAME[$i]} type bond_slave
run ip link set $BONDIF up

or might work

run ip link set $BONDIF up
run ip link set $BONDIF master ${BONDNAME[$i]} type bond_slave
kaaresgut commented 6 months ago

no still bad..... needs to be

I couldn't get ip on 6.12.9 with 6.8.2 kernel. I added back your fix to the go file and it works again.

thor2002ro commented 6 months ago

6.8.2 and the 6.9rc1 work fine for me without the external fix.... no idea what you got there.... with bonding and bridging..... btw docker macvlan and bonding and bridging seams to be fixed in 6.9 at least...

thor2002ro commented 6 months ago

this should fix it for 6.12.9 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240331

kaaresgut commented 5 months ago

this should fix it for 6.12.9 https://github.com/thor2002ro/unraid_kernel/releases/tag/20240331

...and also works for 6.12.10.