Closed fabio-s-franco closed 2 months ago
@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is /var/log/blobfuse2.log
on the VM
@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is /var/log/blobfuse2.log
on the VM
@andyzhangx That's precisely what I am saying is not the case:
kubectl exec -it -n kube-system csi-blob-node-xxxxx -c blob -- bash
root@aks-syspool-xxxxxxxx-vmss000000:/# ls -la /var/log
total 20
drwxr-xr-x 1 root root 4096 Jan 19 02:42 .
drwxr-xr-x 1 root root 4096 Jul 3 2023 ..
drwxr-xr-x 2 root root 4096 Jan 19 02:42 apt
-rw-r--r-- 1 root root 3036 Jan 19 02:42 dpkg.log
That's all there is
@fabio-s-franco the logs are on the node, run following command
sudo wget https://raw.githubusercontent.com/andyzhangx/demo/master/dev/kubectl-enter
sudo chmod a+x ./kubectl-enter
./kubectl-enter <node-name>
@andyzhangx Thank you for looking into it. I have 3 points I observe after following your suggestion:
1 - Using the helper container to shell into the k8s node still does not find the file at the specified location:
./kubectl-enter aks-syspool-xxxxxxxx-vmss000000
spawning "nsenter-46zser" on "aks-syspool-xxxxxxxx-vmss000000"
If you don't see a command prompt, try pressing enter.
root@aks-syspool-xxxxxxxx-vmss000000:/# cat /var/log/blobfuse2.log
cat: /var/log/blobfuse2.log: No such file or directory
2 - The documentation is ambiguous on the meaning of node agent. Every command on the example talks about entering the shell of a container within "csi-node". At no moment it indicates otherwise, other then mentioning agent node, which can be interpreted as the csi-node daemon pod related to the node where the container is being mounted into. Also, as you can see from my output of entering the shell of the driver, it is the same user (root) in the same VM and blobfuse2 command is executable from within it. The context may be different, but one can imagine it can get confusing.
3 - Setting aside the confusion, there is very little detail around using the CSI driver with a private endpoint that uses HNS or that it needs to connect via ADLS API, which requires a separate private endpoint (dfs, not blob one).
can you provide the csi driver daemonset pod logs on the node? and what's the ls -lt /var/log/
output? @fabio-s-franco
@andyzhangx This is a sample of the logs now from a pod on a node that provisions the blob mount:
DRIVER INFORMATION:
-------------------
Build Date: "2024-01-19T02:40:14Z"
Compiler: gc
Driver Name: blob.csi.azure.com
Driver Version: v1.22.5
Git Commit: c62217028e177868059beeae7fb3bbd657060511
Go Version: go1.21.3
Platform: linux/amd64
Streaming logs below:
I0409 14:54:57.501690 155991 blob.go:271] driver userAgent: blob.csi.azure.com/v1.22.5 AKS
I0409 14:54:57.501828 155991 azure.go:74] set QPS(25.000000) and QPS Burst(50) for driver kubeClient
I0409 14:54:57.509908 155991 azure.go:88] reading cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571723 155991 azure.go:95] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0409 14:54:57.571746 155991 azure.go:100] could not read cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571756 155991 azure.go:103] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json
I0409 14:54:57.571790 155991 azure.go:114] read cloud config from file: /etc/kubernetes/azure.json successfully
I0409 14:54:57.572515 155991 azure_auth.go:279] Using AzurePublicCloud environment
I0409 14:54:57.572535 155991 azure_auth.go:130] azure: using managed identity extension to retrieve access token
I0409 14:54:57.572542 155991 azure_auth.go:136] azure: using User Assigned MSI ID to retrieve access token
I0409 14:54:57.572595 155991 azure_auth.go:147] azure: User Assigned MSI ID is client ID
I0409 14:54:57.572623 155991 azure.go:854] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=5, jitter=1.000000
I0409 14:54:57.572831 155991 azure_interfaceclient.go:74] Azure InterfacesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572843 155991 azure_interfaceclient.go:77] Azure InterfacesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572857 155991 azure_vmsizeclient.go:68] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572868 155991 azure_vmsizeclient.go:71] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572895 155991 azure_snapshotclient.go:70] Azure SnapshotClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572904 155991 azure_snapshotclient.go:73] Azure SnapshotClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572914 155991 azure_storageaccountclient.go:70] Azure StorageAccountClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572920 155991 azure_storageaccountclient.go:73] Azure StorageAccountClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572927 155991 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02
I0409 14:54:57.572936 155991 azure_diskclient.go:73] Azure DisksClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572943 155991 azure_diskclient.go:76] Azure DisksClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572956 155991 azure_vmclient.go:70] Azure VirtualMachine client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572976 155991 azure_vmclient.go:73] Azure VirtualMachine client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572987 155991 azure_vmssclient.go:70] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572993 155991 azure_vmssclient.go:73] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573002 155991 azure_vmssvmclient.go:75] Azure vmssVM client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573008 155991 azure_vmssvmclient.go:78] Azure vmssVM client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573021 155991 azure_routeclient.go:69] Azure RoutesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573031 155991 azure_routeclient.go:72] Azure RoutesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573055 155991 azure_subnetclient.go:70] Azure SubnetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573063 155991 azure_subnetclient.go:73] Azure SubnetsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573080 155991 azure_routetableclient.go:69] Azure RouteTablesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573091 155991 azure_routetableclient.go:72] Azure RouteTablesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573105 155991 azure_loadbalancerclient.go:74] Azure LoadBalancersClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573113 155991 azure_loadbalancerclient.go:77] Azure LoadBalancersClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573136 155991 azure_securitygroupclient.go:70] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573145 155991 azure_securitygroupclient.go:73] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573154 155991 azure_publicipclient.go:74] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573160 155991 azure_publicipclient.go:77] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573180 155991 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01
I0409 14:54:57.573195 155991 azure_vmasclient.go:70] Azure AvailabilitySetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573215 155991 azure_vmasclient.go:73] Azure AvailabilitySetsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573273 155991 azure.go:1097] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10
I0409 14:54:57.573304 155991 azure.go:150] starting node server on node(aks-xxxxxxxxxx-xxxxxxxx-vmss000000)
I0409 14:54:57.573322 155991 blob.go:276] cloud: AzurePublicCloud, location: westeurope, rg: mc-xxxx-euw, VnetName: vnet-xxxxxx-xxxx, VnetResourceGroup: rg-xxxxxx-xxxx-xxxx, SubnetName: snet-xxx-aks-xxxxxxxx
I0409 14:54:57.576002 155991 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I0409 14:54:57.576057 155991 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0409 14:54:57.576065 155991 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0409 14:54:57.576071 155991 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576077 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0409 14:54:57.576083 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0409 14:54:57.576088 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0409 14:54:57.576093 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576098 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0409 14:54:57.576103 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0409 14:54:57.576108 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0409 14:54:57.576114 155991 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0409 14:54:57.576120 155991 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576295 155991 server.go:114] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0409 14:54:58.197687 155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.197709 155991 utils.go:76] GRPC request: {}
I0409 14:54:58.199663 155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.351239 155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.351391 155991 utils.go:76] GRPC request: {}
I0409 14:54:58.351430 155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.490367 155991 utils.go:75] GRPC call: /csi.v1.Node/NodeGetInfo
I0409 14:54:58.490382 155991 utils.go:76] GRPC request: {}
I0409 14:54:58.490551 155991 utils.go:82] GRPC response: {"node_id":"aks-xxxxxx-xxxxxxxxxxx-vmss000000"}
Though I have already overcome the issue with DFS connectivity and the problem is more around the ability to properly investigate the cause. which I have more detailed logs on the issue from blobfuse2 repo.
The contents of /var/log are:
total 10424
-rw-r----- 1 syslog adm 1999761 Apr 9 15:33 messages
-rw-r----- 1 syslog adm 1994539 Apr 9 15:33 syslog
drwxr-xr-x 2 root root 12288 Apr 9 15:33 containers
-rw-r----- 1 root root 42889 Apr 9 15:33 azure-vnet.log
-rw-r----- 1 syslog adm 68505 Apr 9 15:33 kern.log
-rw-r----- 1 root root 1935 Apr 9 15:33 azure-vnet-telemetry.log
-rw-r----- 1 syslog adm 62916 Apr 9 15:33 warn
-rw-r----- 1 root root 23032 Apr 9 15:33 azure-vnet-ipam.log
drwxr-xr-x 16 root root 4096 Apr 9 15:33 pods
-rw-r--r-- 1 root root 77594 Apr 9 15:32 waagent.log
-rw-r----- 1 syslog adm 5222 Apr 9 15:25 auth.log
drwx------ 7 root root 4096 Apr 9 13:37 azure
drwxr-xr-x 5 root root 4096 Apr 9 13:03 microsoft-defender-for-cloud
-rw-r----- 1 root adm 4070 Apr 9 12:57 cloud-init-output.log
-rw-r----- 1 syslog adm 164381 Apr 9 12:57 cloud-init.log
-rw-rw-r-- 1 root utmp 17280 Apr 9 12:57 wtmp
-rw-r----- 1 root adm 58706 Apr 9 12:57 dmesg
drwxr-x--- 2 root adm 4096 Apr 9 12:57 unattended-upgrades
-rw-r--r-- 1 root root 0 Apr 9 12:57 ubuntu-advantage.log
-rw-r----- 1 syslog adm 2029328 Apr 9 12:57 messages.1
-rw-r----- 1 syslog adm 2391470 Apr 9 12:57 syslog.1
-rw-r----- 1 syslog adm 32106 Apr 9 12:57 auth.log.1
-rw-r----- 1 syslog adm 826533 Apr 9 12:57 kern.log.1
-rw-rw---- 1 root utmp 0 Apr 9 12:57 btmp
-rw-r--r-- 1 root root 0 Apr 9 12:57 dpkg.log
drwxr-xr-x 2 root root 4096 Apr 9 12:57 apt
-rw-r--r-- 1 root root 0 Apr 9 12:57 alternatives.log
-rw-r----- 1 syslog adm 240139 Apr 9 12:57 warn.1
-rw-rw-r-- 1 root utmp 18701432 Apr 9 12:57 lastlog
drwxr-sr-x+ 5 root systemd-journal 4096 Apr 9 12:56 journal
-rw-r----- 1 root adm 54699 Mar 19 18:48 dmesg.0
drwx------ 2 root root 4096 Mar 19 16:37 landscape
-rw-r----- 1 root root 0 Mar 19 16:37 blobfuse-flexvol-installer.log
-rw-r----- 1 root root 0 Mar 19 16:37 blobfuse-driver.log
-rw-r----- 1 root root 0 Mar 19 16:37 kv-driver.log
-rw-r----- 1 root root 0 Mar 19 16:37 azure-cnimonitor.log
-rw-r----- 1 root root 173758 Mar 19 16:37 dpkg.log.1
-rw-r----- 1 root root 0 Mar 19 16:37 daemon.log
-rw-r----- 1 root root 10655 Mar 19 16:37 alternatives.log.1
-rw-r----- 1 syslog adm 28353 Mar 19 16:37 localmessages
-rw-r----- 1 root root 166508 Mar 19 16:37 ubuntu-advantage.log.1
-rw-r----- 1 root adm 15934 Mar 19 16:34 dmesg.1.gz
-rw-r----- 1 root adm 14393 Mar 19 15:52 dmesg.2.gz
-rw-r----- 1 root adm 12796 Mar 19 15:45 dmesg.3.gz
drwx------ 2 root root 4096 Mar 19 15:44 private
-rw-rw---- 1 root utmp 0 Feb 29 21:56 btmp.1
drwxrws--T 2 ceph ceph 4096 Jan 11 12:26 ceph
drwxr-xr-x 2 root root 4096 Jun 5 2023 sysstat
drwxr-xr-x 2 root root 4096 Mar 14 2023 dist-upgrade
drwxr-xr-x 2 _chrony _chrony 4096 Aug 25 2020 chrony
drwxr-xr-x 2 root root 4096 Jan 25 2020 glusterfs
The closest thing is /var/log/blobfuse-driver.log, which as you can see, is empty. The only way I can get logs from blobfuse2 is by manually executing, blobfuse2 mount command on the csi-node pod. Which then generated the blobfuse file, which helped me identify the issue, which there is more info about, on the blobfuse repo.
edit: find / -name "blobfuse2.log"
does not return any result.
@andyzhangx This is a sample of the logs now from a pod on a node that provisions the blob mount:
DRIVER INFORMATION: ------------------- Build Date: "2024-01-19T02:40:14Z" Compiler: gc Driver Name: blob.csi.azure.com Driver Version: v1.22.5 Git Commit: c62217028e177868059beeae7fb3bbd657060511 Go Version: go1.21.3 Platform: linux/amd64 Streaming logs below: I0409 14:54:57.501690 155991 blob.go:271] driver userAgent: blob.csi.azure.com/v1.22.5 AKS I0409 14:54:57.501828 155991 azure.go:74] set QPS(25.000000) and QPS Burst(50) for driver kubeClient I0409 14:54:57.509908 155991 azure.go:88] reading cloud config from secret kube-system/azure-cloud-provider I0409 14:54:57.571723 155991 azure.go:95] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found I0409 14:54:57.571746 155991 azure.go:100] could not read cloud config from secret kube-system/azure-cloud-provider I0409 14:54:57.571756 155991 azure.go:103] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json I0409 14:54:57.571790 155991 azure.go:114] read cloud config from file: /etc/kubernetes/azure.json successfully I0409 14:54:57.572515 155991 azure_auth.go:279] Using AzurePublicCloud environment I0409 14:54:57.572535 155991 azure_auth.go:130] azure: using managed identity extension to retrieve access token I0409 14:54:57.572542 155991 azure_auth.go:136] azure: using User Assigned MSI ID to retrieve access token I0409 14:54:57.572595 155991 azure_auth.go:147] azure: User Assigned MSI ID is client ID I0409 14:54:57.572623 155991 azure.go:854] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=5, jitter=1.000000 I0409 14:54:57.572831 155991 azure_interfaceclient.go:74] Azure InterfacesClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572843 155991 azure_interfaceclient.go:77] Azure InterfacesClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572857 155991 azure_vmsizeclient.go:68] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572868 155991 azure_vmsizeclient.go:71] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572895 155991 azure_snapshotclient.go:70] Azure SnapshotClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572904 155991 azure_snapshotclient.go:73] Azure SnapshotClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572914 155991 azure_storageaccountclient.go:70] Azure StorageAccountClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572920 155991 azure_storageaccountclient.go:73] Azure StorageAccountClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572927 155991 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02 I0409 14:54:57.572936 155991 azure_diskclient.go:73] Azure DisksClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572943 155991 azure_diskclient.go:76] Azure DisksClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572956 155991 azure_vmclient.go:70] Azure VirtualMachine client (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572976 155991 azure_vmclient.go:73] Azure VirtualMachine client (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572987 155991 azure_vmssclient.go:70] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.572993 155991 azure_vmssclient.go:73] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573002 155991 azure_vmssvmclient.go:75] Azure vmssVM client (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573008 155991 azure_vmssvmclient.go:78] Azure vmssVM client (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573021 155991 azure_routeclient.go:69] Azure RoutesClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573031 155991 azure_routeclient.go:72] Azure RoutesClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573055 155991 azure_subnetclient.go:70] Azure SubnetsClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573063 155991 azure_subnetclient.go:73] Azure SubnetsClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573080 155991 azure_routetableclient.go:69] Azure RouteTablesClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573091 155991 azure_routetableclient.go:72] Azure RouteTablesClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573105 155991 azure_loadbalancerclient.go:74] Azure LoadBalancersClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573113 155991 azure_loadbalancerclient.go:77] Azure LoadBalancersClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573136 155991 azure_securitygroupclient.go:70] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573145 155991 azure_securitygroupclient.go:73] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573154 155991 azure_publicipclient.go:74] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573160 155991 azure_publicipclient.go:77] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573180 155991 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01 I0409 14:54:57.573195 155991 azure_vmasclient.go:70] Azure AvailabilitySetsClient (read ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573215 155991 azure_vmasclient.go:73] Azure AvailabilitySetsClient (write ops) using rate limit config: QPS=10, bucket=100 I0409 14:54:57.573273 155991 azure.go:1097] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10 I0409 14:54:57.573304 155991 azure.go:150] starting node server on node(aks-xxxxxxxxxx-xxxxxxxx-vmss000000) I0409 14:54:57.573322 155991 blob.go:276] cloud: AzurePublicCloud, location: westeurope, rg: mc-xxxx-euw, VnetName: vnet-xxxxxx-xxxx, VnetResourceGroup: rg-xxxxxx-xxxx-xxxx, SubnetName: snet-xxx-aks-xxxxxxxx I0409 14:54:57.576002 155991 mount_linux.go:284] Detected umount with safe 'not mounted' behavior I0409 14:54:57.576057 155991 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME I0409 14:54:57.576065 155991 driver.go:80] Enabling controller service capability: EXPAND_VOLUME I0409 14:54:57.576071 155991 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER I0409 14:54:57.576077 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER I0409 14:54:57.576083 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY I0409 14:54:57.576088 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER I0409 14:54:57.576093 155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER I0409 14:54:57.576098 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY I0409 14:54:57.576103 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER I0409 14:54:57.576108 155991 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER I0409 14:54:57.576114 155991 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME I0409 14:54:57.576120 155991 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER I0409 14:54:57.576295 155991 server.go:114] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"} I0409 14:54:58.197687 155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo I0409 14:54:58.197709 155991 utils.go:76] GRPC request: {} I0409 14:54:58.199663 155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"} I0409 14:54:58.351239 155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo I0409 14:54:58.351391 155991 utils.go:76] GRPC request: {} I0409 14:54:58.351430 155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"} I0409 14:54:58.490367 155991 utils.go:75] GRPC call: /csi.v1.Node/NodeGetInfo I0409 14:54:58.490382 155991 utils.go:76] GRPC request: {} I0409 14:54:58.490551 155991 utils.go:82] GRPC response: {"node_id":"aks-xxxxxx-xxxxxxxxxxx-vmss000000"}
Though I have already overcome the issue with DFS connectivity and the problem is more around the ability to properly investigate the cause. which I have more detailed logs on the issue from blobfuse2 repo.
The contents of /var/log are:
total 10424 -rw-r----- 1 syslog adm 1999761 Apr 9 15:33 messages -rw-r----- 1 syslog adm 1994539 Apr 9 15:33 syslog drwxr-xr-x 2 root root 12288 Apr 9 15:33 containers -rw-r----- 1 root root 42889 Apr 9 15:33 azure-vnet.log -rw-r----- 1 syslog adm 68505 Apr 9 15:33 kern.log -rw-r----- 1 root root 1935 Apr 9 15:33 azure-vnet-telemetry.log -rw-r----- 1 syslog adm 62916 Apr 9 15:33 warn -rw-r----- 1 root root 23032 Apr 9 15:33 azure-vnet-ipam.log drwxr-xr-x 16 root root 4096 Apr 9 15:33 pods -rw-r--r-- 1 root root 77594 Apr 9 15:32 waagent.log -rw-r----- 1 syslog adm 5222 Apr 9 15:25 auth.log drwx------ 7 root root 4096 Apr 9 13:37 azure drwxr-xr-x 5 root root 4096 Apr 9 13:03 microsoft-defender-for-cloud -rw-r----- 1 root adm 4070 Apr 9 12:57 cloud-init-output.log -rw-r----- 1 syslog adm 164381 Apr 9 12:57 cloud-init.log -rw-rw-r-- 1 root utmp 17280 Apr 9 12:57 wtmp -rw-r----- 1 root adm 58706 Apr 9 12:57 dmesg drwxr-x--- 2 root adm 4096 Apr 9 12:57 unattended-upgrades -rw-r--r-- 1 root root 0 Apr 9 12:57 ubuntu-advantage.log -rw-r----- 1 syslog adm 2029328 Apr 9 12:57 messages.1 -rw-r----- 1 syslog adm 2391470 Apr 9 12:57 syslog.1 -rw-r----- 1 syslog adm 32106 Apr 9 12:57 auth.log.1 -rw-r----- 1 syslog adm 826533 Apr 9 12:57 kern.log.1 -rw-rw---- 1 root utmp 0 Apr 9 12:57 btmp -rw-r--r-- 1 root root 0 Apr 9 12:57 dpkg.log drwxr-xr-x 2 root root 4096 Apr 9 12:57 apt -rw-r--r-- 1 root root 0 Apr 9 12:57 alternatives.log -rw-r----- 1 syslog adm 240139 Apr 9 12:57 warn.1 -rw-rw-r-- 1 root utmp 18701432 Apr 9 12:57 lastlog drwxr-sr-x+ 5 root systemd-journal 4096 Apr 9 12:56 journal -rw-r----- 1 root adm 54699 Mar 19 18:48 dmesg.0 drwx------ 2 root root 4096 Mar 19 16:37 landscape -rw-r----- 1 root root 0 Mar 19 16:37 blobfuse-flexvol-installer.log -rw-r----- 1 root root 0 Mar 19 16:37 blobfuse-driver.log -rw-r----- 1 root root 0 Mar 19 16:37 kv-driver.log -rw-r----- 1 root root 0 Mar 19 16:37 azure-cnimonitor.log -rw-r----- 1 root root 173758 Mar 19 16:37 dpkg.log.1 -rw-r----- 1 root root 0 Mar 19 16:37 daemon.log -rw-r----- 1 root root 10655 Mar 19 16:37 alternatives.log.1 -rw-r----- 1 syslog adm 28353 Mar 19 16:37 localmessages -rw-r----- 1 root root 166508 Mar 19 16:37 ubuntu-advantage.log.1 -rw-r----- 1 root adm 15934 Mar 19 16:34 dmesg.1.gz -rw-r----- 1 root adm 14393 Mar 19 15:52 dmesg.2.gz -rw-r----- 1 root adm 12796 Mar 19 15:45 dmesg.3.gz drwx------ 2 root root 4096 Mar 19 15:44 private -rw-rw---- 1 root utmp 0 Feb 29 21:56 btmp.1 drwxrws--T 2 ceph ceph 4096 Jan 11 12:26 ceph drwxr-xr-x 2 root root 4096 Jun 5 2023 sysstat drwxr-xr-x 2 root root 4096 Mar 14 2023 dist-upgrade drwxr-xr-x 2 _chrony _chrony 4096 Aug 25 2020 chrony drwxr-xr-x 2 root root 4096 Jan 25 2020 glusterfs
The closest thing is /var/log/blobfuse-driver.log, which as you can see, is empty. The only way I can get logs from blobfuse2 is by manually executing, blobfuse2 mount command on the csi-node pod. Which then generated the blobfuse file, which helped me identify the issue, which there is more info about, on the blobfuse repo.
edit:
find / -name "blobfuse2.log"
does not return any result.
@fabio-s-franco from the csi driver logs on that node, I don't see there is any mount activity logs, is that the right node?
@andyzhangx You are right, the node where the container was being mounted changed. I can now find these files:
-rw-r----- 1 syslog adm 0 Apr 9 16:00 blobfuse2.log
-rw-r----- 1 syslog adm 1458 Apr 9 15:51 blobfuse2.log-20240409
But that accounts only for a small part of the problem (log-wise):
1 - The documentation does not detail the nuances of HNS enabled mounts, via private endpoint. Even when the parameter networkEndpointType=privatEndpoint, it does not work when HNS is enabled (which requires creating an additional private endpoint, for DFS subresource). Both kubelet and control plane user assigned identities had the contributor role over the entire subscription (to make sure that was not the problem). And in my case, the creation of the private endpoint is done in terraform because they are needed for other purposes, so I don't really want the driver to create it for me. In short, making the HNS enabled mount with private endpoints using the driver is very difficult to do due to lack (or I could not find) of examples or details in the docs.
2 - Debugging documentation is not clear about some points (like shell into the aks node, instead of the blob container within csi-blob-node pod). I was only able to get it to work after understanding blobfuse2 and running commands there directly, verifying its logs until I could mount it (after seeing it tries to use .dfs endpoint when --use-adls=true
). The logs when ran from there have much more than what I could find on the node agent. All I get from the node agent is:
root@aks-*******-********-vmss000001:/var/log# cat ./blobfuse2.log-20240409
Apr 9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_ERR [root.go (149)]: beginDetectNewVersion: error parsing remoteVersion [invalid version string]
Apr 9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_WARNING [mount.go (384)]: mount: unsupported v1 CLI parameter: pre-mount-validate is always true in blobfuse2.
Apr 9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_CRIT [mount.go (404)]: Starting Blobfuse2 Mount : 2.2.1 on [Ubuntu 22.04.4 LTS]
Apr 9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_CRIT [mount.go (405)]: Logging level set to : LOG_WARNING
Apr 9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_ERR [file_cache.go (256)]: FileCache: config error [tmp-path does not exist. attempting to create tmp-path.]
Apr 9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_ERR [root.go (149)]: beginDetectNewVersion: error parsing remoteVersion [invalid version string]
Apr 9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_WARNING [mount.go (384)]: mount: unsupported v1 CLI parameter: pre-mount-validate is always true in blobfuse2.
Apr 9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_CRIT [mount.go (404)]: Starting Blobfuse2 Mount : 2.2.1 on [Ubuntu 22.04.4 LTS]
Apr 9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_CRIT [mount.go (405)]: Logging level set to : LOG_WARNING
root@aks-*******-********-vmss000001:/var/log#
And the matching logs on daemonset:
I0409 15:51:31.447241 5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:31.449233 5631 blob.go:386] parsing volumeID(**************) return with error: error parsing volume id: "**************", should at least contain two #
I0409 15:51:31.449255 5631 blob.go:458] volumeID(**************) authEnv: []
I0409 15:51:31.575866 5631 blob.go:513] get account(*******) key from secret(default, azure-storage-account-*******-secret) failed with error: could not get secret(azure-storage-account-*******-secret): secrets "azure-storage-account-*******-secret" not found, use cluster identity to get account key instead
I0409 15:51:31.814590 5631 azure_storageaccount.go:163] got storage account key with creation time: 2024-02-26 14:59:30.7437699 +0000 UTC
I0409 15:51:31.814671 5631 nodeserver.go:366] target /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount
protocol fuse2
volumeId **************
context map[containerName:***-*** getLatestAccountKey:true isHnsEnabled:true networkEndpointType:privateEndpoint protocol:fuse2 resourceGroup:rg-****-****-****-euw server:*******.blob.core.windows.net storageAccount:******* useDataPlaneAPI:true]
mountflags [-o negative_timeout=120 --virtual-directory=true -o allow_other --use-adls=true -o attr_timeout=120 -o entry_timeout=120]
mountOptions [--use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true]
args /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
serverAddress *******.blob.core.windows.net
I0409 15:51:31.814749 5631 nodeserver.go:154] start connecting to blobfuse proxy, protocol: fuse2, args: /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
I0409 15:51:31.819421 5631 nodeserver.go:163] begin to mount with blobfuse proxy, protocol: fuse2, args: /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
I0409 15:51:33.394180 5631 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I0409 15:51:33.394360 5631 nodeserver.go:624] blobfuse mount at /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount success
I0409 15:51:33.394387 5631 nodeserver.go:424] volume(**************) mount on "/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount" succeeded
I0409 15:51:33.394403 5631 utils.go:82] GRPC response: {}
I0409 15:51:33.397375 5631 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0409 15:51:33.397386 5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","target_path":"/var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"*******-*******-z6gvw","csi.storage.k8s.io/pod.namespace":"***","csi.storage.k8s.io/pod.uid":"5f19c464-be92-47d0-891c-028a66873c1b","csi.storage.k8s.io/serviceAccount.name":"***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:33.398065 5631 nodeserver.go:127] NodePublishVolume: volume ************** mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount with mountOptions: [bind]
I0409 15:51:33.398168 5631 mount_linux.go:245] Detected OS without systemd
I0409 15:51:33.398173 5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.408532 5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.409917 5631 nodeserver.go:143] NodePublishVolume: volume ************** mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount successfully
I0409 15:51:33.409931 5631 utils.go:82] GRPC response: {}
I0409 15:51:33.435282 5631 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0409 15:51:33.435303 5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","target_path":"/var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"***-***-f4d754c8b-8qtnt","csi.storage.k8s.io/pod.namespace":"***","csi.storage.k8s.io/pod.uid":"66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1","csi.storage.k8s.io/serviceAccount.name":"***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:33.435922 5631 nodeserver.go:127] NodePublishVolume: volume ************** mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount with mountOptions: [bind]
I0409 15:51:33.435950 5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.440429 5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.441626 5631 nodeserver.go:143] NodePublishVolume: volume ************** mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount successfully
I0409 15:51:33.441643 5631 utils.go:82] GRPC response: {}
An example I posted on blobfuse2 issue:
===== RESPONSE ERROR (ServiceCode=AuthorizationFailure) =====
Description=403 This request is not authorized to perform this operation., Details: (none)
GET https://*.dfs.core.windows.net/mycont?maxResults=2&recursive=false&resource=filesystem&timeout=901 # <<<<<<< Ifinally was able to resolve it, when I saw this
Authorization: REDACTED
User-Agent: [Azure-Storage-Fuse/2.1.2 (Debian GNU/Linux 12 (bookworm)) Azure-Storage/0.1 (go1.20.5; linux)]
X-Ms-Client-Request-Id: [REDACTED]
X-Ms-Date: [Fri, 05 Apr 2024 11:40:17 GMT]
X-Ms-Version: [2018-11-09]
--------------------------------------------------------------------------------
RESPONSE Status: 403 This request is not authorized to perform this operation.
Content-Length: [194]
Content-Type: [application/json;charset=utf-8]
Date: [Fri, 05 Apr 2024 11:40:17 GMT]
Server: [Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Error-Code: [AuthorizationFailure]
X-Ms-Request-Id: [REDACTED]
X-Ms-Version: [2018-11-09]
]
So, to sum it up. This process could have been a lot friendlier. And I believe that can be mostly achieved via documentation with accompanying examples and an improved debugging experience, which in part is derived from the wiki debug steps that could use a bit more clarification.
I think that is it. My problem is already resolved, so I am here is just to provided some feedback to help the next warrior.
@fabio-s-franco I see you have set following mount options in the volume, so --use-adls=true
with server: *******.blob.core.windows.net
parameter solves your issue, right? and I don't think you need to set networkEndpointType: privateEndpoint
since privateEndpoint is created by yourself, and you have already set server
parameter
"containerName":"***-***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"
@andyzhangx Hi, no these logs are a bit misleading.
When --use-adls=true
.blob.core.windows.net endpoint does not work, and the logs are misleading in that case. Setting server to that URL has no effect. I encourage you to have a look at the issue https://github.com/Azure/azure-storage-fuse/issues/1376, as I imagine all that does is set the AZURE_STORAGE_BLOB_ENDPOINT
environment variable.
When calling the command with --use-adls=true
, this environment variable is not used and the .dfs
endpoint is used instead. I caught that by inspecting blobfuse2 logs when manually calling the command to mount the container. You need to setup a separate private endpoint targeting DFS subresouce to use HNS and this can only be noticed when public access is disabled.
It's correct, networkEndpointType
does not need to be set, but was part of my trial-and-error debugging as there was no other way for me to do a more objective diagnosis. The documentation is not clear whether it is a requirement to route through private endpoint, or if it's just meant to create the private endpoint automatically (which would not work for HNS, if it doesn't create a private endpoint targetting DFS subresource).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What happened: Had an issue (https://github.com/Azure/azure-storage-fuse/issues/1376) with HNS enabled blobfuse2 based mount when it is only accessible via private endpoint (public network access disabled). It made it quite difficult to workout the issue because there are no log files on the location documented by blobfuse2 project
/var/log/blobfuse2.log
.In fact, the only way I could find any logs at all was to shell into the relevant node agent (csi-blob-node daemonset) and execute a mount command by hand. Other than that, there are no blobfuse2.log files available from within the blob container within the pod.
What you expected to happen: Find log files to debug issues on the documented location of
/var/log/blobfuse2.log
How to reproduce it: Enable the managed driver via azure-cli or any other means in an AKS cluster. Version installed on my cluster has blobfuse2 2.1.2
Anything else we need to know?:
The original issue was caused by the fact that blobfuse2 does not try to connect with standard (non HNS) endpoint fdqn, therefore the private endpoint naively setup using the blob as target subresource instead of dfs, will not be reachable as it will attempt to connect to .dfs.core.windows.net instead of .blob.core.windows.net, whose private DNS zone will not exist when the private endpoint was initially set to accommodate standard non HNS mounts. It is not clear and it is very difficult to debug that for the same storage account, if you can mount either HNS enabled or disabled containers, the former requires a private endpoint setup specifically for that purpose which resolves to a different fdqn than the latter. On top of that it does not seem to be possible to configure the endpoint to point to another fdqn for that case. Even setting
AZURE_STORAGE_BLOB_ENDPOINT
environment variable directly, does not influence the fdqn used by blobfuse2 to connect.So, without any logs, it becomes nearly impossible to determine the root cause. What is documented on https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md will only output what is directly sent to stdout. For example:
Error: failed to initialize new pipeline [failed to authenticate credentials for azstorage]
This error does not help determine the problem, which can have several different underlying root causes (I alone, have experienced three different ones, for the same error message). Which I believe is an issue for the blobfuse2 to handle (better error output), but nonetheless it doesn't help that no log is persisted within the node agent. Moreover systemctl (or journalctl) command does not work on node agent pod as blobfuse2 is not managed by systemd. So the documentation is either inaccurate or not applicable to managed installation of CSI Blob driver / AKS setup I have.
Environment:
1.22.5
kubectl version
):Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.9", GitCommit:"d33c44091f0e760b0053f06023e87a1c99dfd302", GitTreeState:"clean", BuildDate:"2024-01-31T01:58:06Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"
uname -a
): 5.15.0-1058-azure #66-Ubuntu SMP Fri Feb 16 00:40:24 UTC 2024 x86_64 GNU/Linuxazure-cli
blobfuse2 2.1.2