tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
701 stars 281 forks source link

File operations hang indefinitely when accessing azure blob storage #1276

Open donglinz opened 3 years ago

donglinz commented 3 years ago

Hi,

I am running the tensorflow_io colab tutorial. The code works well when working with Azurite enumerator. However, the file operations hang indefinitely when access files in azure blob storage.

The colab is over here: https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/azure.ipynb#scrollTo=yZmI7l_GykcW

For instance, the following operations will hang when access azure blob:

tf.io.gfile.mkdir(pathname)
with tf.io.gfile.GFile(filename, mode='w') as w:
  w.write("Hello, world!")

with tf.io.gfile.GFile(filename, mode='r') as r:
  print(r.read())

Any help will be highly appreciated.

kvignesh1420 commented 3 years ago

@donglinz can you share some information on your azure environment?

donglinz commented 3 years ago

@kvignesh1420 I was using an Azure blob storage in east US region. Performance: Standard Access Tier: Hot Replication: Locally-redundant storage (LRS) Account type: BlobStorage You can reach detailed specification in below exported template.

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "storageAccounts_azurecolabstore_name": {
            "defaultValue": "azurecolabstore",
            "type": "String"
        }
    },
    "variables": {},
    "resources": [
        {
            "type": "Microsoft.Storage/storageAccounts",
            "apiVersion": "2020-08-01-preview",
            "name": "[parameters('storageAccounts_azurecolabstore_name')]",
            "location": "eastus",
            "sku": {
                "name": "Standard_LRS",
                "tier": "Standard"
            },
            "kind": "BlobStorage",
            "properties": {
                "minimumTlsVersion": "TLS1_2",
                "allowBlobPublicAccess": true,
                "networkAcls": {
                    "bypass": "AzureServices",
                    "virtualNetworkRules": [],
                    "ipRules": [],
                    "defaultAction": "Allow"
                },
                "supportsHttpsTrafficOnly": true,
                "encryption": {
                    "services": {
                        "file": {
                            "keyType": "Account",
                            "enabled": true
                        },
                        "blob": {
                            "keyType": "Account",
                            "enabled": true
                        }
                    },
                    "keySource": "Microsoft.Storage"
                },
                "accessTier": "Hot"
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts/blobServices",
            "apiVersion": "2020-08-01-preview",
            "name": "[concat(parameters('storageAccounts_azurecolabstore_name'), '/default')]",
            "dependsOn": [
                "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_azurecolabstore_name'))]"
            ],
            "sku": {
                "name": "Standard_LRS",
                "tier": "Standard"
            },
            "properties": {
                "cors": {
                    "corsRules": []
                },
                "deleteRetentionPolicy": {
                    "enabled": false
                }
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts/tableServices",
            "apiVersion": "2020-08-01-preview",
            "name": "[concat(parameters('storageAccounts_azurecolabstore_name'), '/default')]",
            "dependsOn": [
                "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_azurecolabstore_name'))]"
            ],
            "properties": {
                "cors": {
                    "corsRules": []
                }
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
            "apiVersion": "2020-08-01-preview",
            "name": "[concat(parameters('storageAccounts_azurecolabstore_name'), '/default/datasets')]",
            "dependsOn": [
                "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccounts_azurecolabstore_name'), 'default')]",
                "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_azurecolabstore_name'))]"
            ],
            "properties": {
                "defaultEncryptionScope": "$account-encryption-key",
                "denyEncryptionScopeOverride": false,
                "publicAccess": "None"
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
            "apiVersion": "2020-08-01-preview",
            "name": "[concat(parameters('storageAccounts_azurecolabstore_name'), '/default/iwslt')]",
            "dependsOn": [
                "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccounts_azurecolabstore_name'), 'default')]",
                "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_azurecolabstore_name'))]"
            ],
            "properties": {
                "defaultEncryptionScope": "$account-encryption-key",
                "denyEncryptionScopeOverride": false,
                "publicAccess": "None"
            }
        },
        {
            "type": "Microsoft.Storage/storageAccounts/blobServices/containers",
            "apiVersion": "2020-08-01-preview",
            "name": "[concat(parameters('storageAccounts_azurecolabstore_name'), '/default/wmt')]",
            "dependsOn": [
                "[resourceId('Microsoft.Storage/storageAccounts/blobServices', parameters('storageAccounts_azurecolabstore_name'), 'default')]",
                "[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccounts_azurecolabstore_name'))]"
            ],
            "properties": {
                "defaultEncryptionScope": "$account-encryption-key",
                "denyEncryptionScopeOverride": false,
                "publicAccess": "None"
            }
        }
    ]
}

Looks tensorflow-io is tested against non-official emulators. There might be some gaps between the live system and offline emulators:) image

Thanks

donglinz commented 3 years ago

@yongtang Can you please take a look? Here is a minimum reproducible code snippet including the azure blob storage key for testing: https://colab.research.google.com/drive/1UcyUKKVDF2ttOOG5yRSIDAO8gsw1c3mz?usp=sharing

yongtang commented 3 years ago

@donglinz I used to have azure access though I don't have the access at the moment. Will take a look and see though it may take some extra time to get the azure access first.

From the my previous experiences, if an operation hangs sometime it is tied to the mismatch of endpoint (or endpoint port was blocked). Could this be the cause?

donglinz commented 3 years ago

@yongtang Thank you for your reply. I think neither endpoint not port was blocked since I can connect to my Azure blob storage via Python client

burgerkingeater commented 3 years ago

@donglinz which specific call is hanging?

donglinz commented 3 years ago

@burgerkingeater All call I have tried will be hanging (create folder, read, write)

donglinz commented 3 years ago

@yongtang Greeting! May I expect some update on this?