microsoft / azurelinux

Linux OS for Azure 1P services and edge appliances
MIT License
4.29k stars 540 forks source link

kernel SMB client does not automatically update IP address of the cifs mount after DNS update. #9096

Open boliu83 opened 5 months ago

boliu83 commented 5 months ago

Describe the bug kernel SMB client does not automatically update IP address of the cifs mount after DNS update.

This is a known issue in some early versions of Linux kernel and should have been fixed. However i can still reproduce this issue on latest AzureLinux running 5.15+ kernel.

To Reproduce Steps to reproduce the behavior:

  1. Use an AKS agentnode running AzureLinux
    
    root [ /home/azureuser ]# cat /etc/os-release
    NAME="Common Base Linux Mariner"
    VERSION="2.0.20240403"
    ID=mariner
    VERSION_ID="2.0"
    PRETTY_NAME="CBL-Mariner/Linux"
    ANSI_COLOR="1;34"
    HOME_URL="https://aka.ms/cbl-mariner"
    BUG_REPORT_URL="https://aka.ms/cbl-mariner"
    SUPPORT_URL="https://aka.ms/cbl-mariner"

root [ /home/azureuser ]# uname -r 5.15.153.1-2.cm2

root [ /home/azureuser ]# rpm -qa | grep keyutils keyutils-1.6.3-1.cm2.x86_64


2. Mount Azure fileshare

sudo mount -t cifs //f19b3e40c80e54919ae4890.file.core.windows.net/pvc-9717cacf-4c54-4341-b336-f5398bd41b38 /mnt/pvc-9717cacf-4c54-4341-b336-f5398bd41b38 -o credentials=/etc/smbcredentials/f19b3e40c80e54919ae4890.cred,dir_mode=0777,file_mode=0777,serverino,nosharesock,actimeo=30

3. After the Azure fileshare is mounted, take note of the IP address used by the mount.  IP address used by the mount should match the IP address of the fileshare endpoint 
![image](https://github.com/microsoft/azurelinux/assets/7767774/1f458f3e-3afe-4f0d-8c48-ac7dada5cd26)

4. On Azure storage account, initiate geographic failover which updates the fileshare DNS and point it to the IP address of the secondary site. 
![image](https://github.com/microsoft/azurelinux/assets/7767774/f0e1d3e4-e33e-4600-b51f-bf1444656faf)

6. See error
Fileshare mount still uses the old IP address of the fileshare 
![image](https://github.com/microsoft/azurelinux/assets/7767774/d416983b-181e-4abb-b193-22424153a7cd)

trying to access the fileshare now get "Host is down" error

root [ /home/azureuser ]# ls /mnt/pvc-9717cacf-4c54-4341-b336-f5398bd41b38 ls: cannot access '/mnt/pvc-9717cacf-4c54-4341-b336-f5398bd41b38': Host is down


kernel error logged in syslog

2024-05-14T13:56:27.490067+00:00 aks-azurelinux-13697370-vmss000006 kernel: CIFS: Status code returned 0xc000006d STATUS_LOGON_FAILURE 2024-05-14T13:56:27.490233+00:00 aks-azurelinux-13697370-vmss000006 kernel: CIFS: VFS: \f19b3e40c80e54919ae4890.file.core.windows.net Send error in SessSetup = -13



**Expected behavior**
SMB client should detect the DNS change and automatically update mount point to use the new IP address.
rlmenge commented 5 months ago

@christopherco @microsoft/cbl-mariner-kernel

mbelt commented 2 months ago

The article linked at the top is nonsense. It references kernel fixes from three years ago. Then, says we need AzureLinux 2.0 with kernel version 5.15.159 to pick them up. 🤨

There is a known SMB client regression that is present in 5.15.153. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.15.158&id=929ba00702cfba77fbb3fbc88d9a0f6e2af2f6b6

Could this regression explain the behavior seen here? If so, it's fixed in 5.15.158.

boliu83 commented 2 months ago

The article linked at the top is nonsense. It references kernel fixes from three years ago. Then, says we need AzureLinux 2.0 with kernel version 5.15.159 to pick them up. 🤨

There is a known SMB client regression that is present in 5.15.153. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.15.158&id=929ba00702cfba77fbb3fbc88d9a0f6e2af2f6b6

Could this regression explain the behavior seen here? If so, it's fixed in 5.15.158.

this is fixed in 5.15.160.1-1.cm2 which was released a week ago or so. thanks for the link to the regression bug . later i realized that's the case too.