Open kmelmon opened 2 months ago
Can you also share your Storage Explorer app logs? How long does it take for the hang to occur?
Can't seem to find them right now, but we have a few other issues other users have opened related to this.
We have an ongoing discussion with AzCopy developers about an authentication question. Essentially, AzCopy clients are expected to insert an authentication token in the system credential store, and it reads the token to do its work. For long-running operations, clients also need to refresh the token. As far as we can tell, Storage Explorer does this as expected. What remains to be determined is whether AzCopy is checking the credential store for refreshed tokens at regular intervals.
Here are the app logs. I don't know exactly how long it takes for the hang to occur, but it seems like you can compute it from the AzCopy log. It seems to be under an hour from rough observation. AppLogs.zip
Yeah, that sounds about right. Tokens are normally good for about an hour before they need to be refreshed. So, it definitely sounds like AzCopy is not getting the refreshed token.
One way to test this is by running AzCopy directly yourself. You can copy the command Storage Explorer uses from the link in the Activity Log. If AzCopy can't complete on its own, it isn't doing something right.
2024/09/25 19:05:28 PERF: primary performance constraint is Disk. States: X: 0, O: 0, M: 0, L: 0, R: 0, D: 1, W: 0, F: 0, B: 1, E: 0, T: 2, GRs: 5 2024/09/25 19:05:28 10.7 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total, 2-sec Throughput (Mb/s): 134.1982 (Disk may be limiting speed) 2024/09/25 19:05:28 ==> REQUEST/RESPONSE (Try=1/306.3515ms, OpTime=469.0319ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTUy&comp=block X-Ms-Request-Id: [5f0b258a-c01e-0111-037d-0f3e52000000]
2024/09/25 19:05:29 ==> REQUEST/RESPONSE (Try=1/290.7196ms, OpTime=564.8591ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTUz&comp=block X-Ms-Request-Id: [5f0b2750-c01e-0111-157d-0f3e52000000]
2024/09/25 19:05:30 PERF: primary performance constraint is Disk. States: X: 0, O: 0, M: 0, L: 0, R: 0, D: 1, W: 1, F: 0, B: 0, E: 0, T: 2, GRs: 5 2024/09/25 19:05:30 10.7 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total, 2-sec Throughput (Mb/s): 66.8998 (Disk may be limiting speed) 2024/09/25 19:05:31 ==> REQUEST/RESPONSE (Try=1/228.61ms, OpTime=525.8384ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTU0&comp=block X-Ms-Request-Id: [5f0b28e6-c01e-0111-7f7d-0f3e52000000]
2024/09/25 19:05:32 PERF: primary performance constraint is Disk. States: X: 0, O: 0, M: 0, L: 0, R: 0, D: 1, W: 0, F: 0, B: 1, E: 0, T: 2, GRs: 5 2024/09/25 19:05:32 10.7 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total, 2-sec Throughput (Mb/s): 129.4926 (Disk may be limiting speed) 2024/09/25 19:05:32 ==> REQUEST/RESPONSE (Try=1/434.9037ms, OpTime=693.251ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTU1&comp=block X-Ms-Request-Id: [5f0b2b1a-c01e-0111-777d-0f3e52000000]
2024/09/25 19:05:33 ==> REQUEST/RESPONSE (Try=1/307.093ms, OpTime=551.3149ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTU2&comp=block X-Ms-Request-Id: [5f0b2cbd-c01e-0111-797d-0f3e52000000]
2024/09/25 19:05:34 PERF: primary performance constraint is Disk. States: X: 0, O: 0, M: 0, L: 0, R: 0, D: 1, W: 0, F: 0, B: 1, E: 0, T: 2, GRs: 5 2024/09/25 19:05:34 10.7 %, 0 Done, 0 Failed, 1 Pending, 0 Skipped, 1 Total, 2-sec Throughput (Mb/s): 71.2911 (Disk may be limiting speed) 2024/09/25 19:05:35 ==> REQUEST/RESPONSE (Try=1/324.4066ms, OpTime=623.817ms) -- RESPONSE SUCCESSFULLY RECEIVED PUT https://gfxmltrainingstore1.blob.core.windows.net/cadmus/TestLargeUpload%2Ftest2%2Fdownsample_text_bilinear.hdf5?blockid=MDAwMDBsRMGle8NE2GLVLVZQMZkSMDAwMDAwMDAwMDA0OTU3&comp=block X-Ms-Request-Id: [5f0b2e6a-c01e-0111-707d-0f3e52000000]
RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Content-Length: 404
Content-Type: application/xml
Date: Wed, 25 Sep 2024 19:05:37 GMT
Server: Microsoft-HTTPAPI/2.0
Www-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com
X-Ms-Error-Code: InvalidAuthenticationInfo
X-Ms-Request-Id: 5f0b3046-c01e-0111-327d-0f3e52000000
Response Details: InvalidAuthenticationInfo
RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Content-Length: 404
Content-Type: application/xml
Date: Wed, 25 Sep 2024 19:05:37 GMT
Server: Microsoft-HTTPAPI/2.0
Www-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com
X-Ms-Error-Code: InvalidAuthenticationInfo
X-Ms-Request-Id: 0ebdb9b9-301e-0047-6a7d-0f31b0000000
Response Details: InvalidAuthenticationInfo
2024/09/25 19:05:36 INFO: [P#0-T#0] Early close of chunk in singleChunkReader with context still active 2024/09/25 19:05:36 ERR: [P#0-T#0] UPLOADFAILED: \?\Z:\downsample_text_bilinear.hdf5 : 401 : 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.. When Staging block. X-Ms-Request-Id: 0ebdb9b9-301e-0047-6a7d-0f31b0000000
RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Content-Length: 404
Content-Type: application/xml
Date: Wed, 25 Sep 2024 19:05:39 GMT
Server: Microsoft-HTTPAPI/2.0
Vary: Origin
Www-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com
X-Ms-Error-Code: InvalidAuthenticationInfo
X-Ms-Request-Id: dc134f59-401e-0000-597d-0f5aeb000000
Response Details: InvalidAuthenticationInfo
RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Content-Length: 404
Content-Type: application/xml
Date: Wed, 25 Sep 2024 19:05:39 GMT
Server: Microsoft-HTTPAPI/2.0
Vary: Origin
Www-Authenticate: Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com
X-Ms-Error-Code: InvalidAuthenticationInfo
X-Ms-Request-Id: dc134f5f-401e-0000-5e7d-0f5aeb000000
Response Details: InvalidAuthenticationInfo
2024/09/25 19:05:37 JobID=a5c1446c-c37b-d844-62d5-2d5650319912, Part#=0, TransfersDone=1 of 1 2024/09/25 19:05:37 all parts of entire Job a5c1446c-c37b-d844-62d5-2d5650319912 successfully completed, cancelled or paused 2024/09/25 19:05:37 is part of Job which 1 total number of parts done 2024/09/25 19:05:38 PERF: primary performance constraint is Unknown. States: X: 0, O: 0, M: 0, L: 0, R: 0, D: 0, W: 0, F: 0, B: 0, E: 0, T: 0, GRs: 5 2024/09/25 19:05:38 0.0 %, 0 Done, 1 Failed, 0 Pending, 0 Skipped, 1 Total, 2024/09/25 19:05:38
Diagnostic stats: IOPS: 1 End-to-end ms per request: 669 Network Errors: 0.00% Server Busy: 0.00%
Job a5c1446c-c37b-d844-62d5-2d5650319912 summary Elapsed Time (Minutes): 71.5707 Number of File Transfers: 1 Number of Folder Property Transfers: 0 Number of Symlink Transfers: 0 Total Number of Transfers: 1 Number of File Transfers Completed: 0 Number of Folder Transfers Completed: 0 Number of File Transfers Failed: 1 Number of Folder Transfers Failed: 0 Number of File Transfers Skipped: 0 Number of Folder Transfers Skipped: 0 Total Number of Bytes Transferred: 0 Final Job Status: Failed
2024/09/25 19:05:38 Closing Log
Yep, error message says the token is expired. So I highly suspect that's why the job is hanging. I'm pretty sure something in AzCopy isn't checking for a refreshed token. I'll keep reaching out to the AzCopy team to get something moving.
Any update on this? Our team would really benefit from a fix as we use Azure Storage for all our ML training data.
We have no significant updates. The AzCopy team is aware of the issue and are currently looking into it.
My team is facing the same issue with large files in a major customer project. Can you point me to AzCopy issue related to this bug?
To some extent we were able to circumvent this OAuth expiration issue by using SAS-tokens for authentication - but unfortunately Storage Explorer appears to default to SAS-token authentication only if user has Storage account level Microsoft.Storage/storageAccounts/listkeys/action permission - something we can't grant to out end users. User can create SAS token manually without that Storage account level permission and use the token to do large transfers successfully - but instructing this manual method to all end users is not doable. Maybe storage explorer could default to sas-tokens even without said listkeys permission?
We've been using version 1.19.1 as a workaround for now: https://github.com/microsoft/AzureStorageExplorer/releases/tag/V1.19.1
@jarkko-siitonen Keep in mind there are two different kinds of SAS: key-based SAS and user-delegated SAS. Storage Explorer supports both. So you do not need to give out List Keys permissions. If the user has permissions to generate user-delegated SAS, Storage Explorer should be able to use that method instead.
@JasonYeMSFT Correct me, if I'm wrong?
@craxal Yes, but how do you make Storage explorer to actually use SAS tokens without a) generating the SAS token manually and connecting the explorer to Shared access signature URL b) granting the listkeys action
My testing seems to indicate Storage explorer passes AZCOPY_CRED_TYPE = "OAuthToken" parameter to AzCopy unless either of those 2 is true. Would be delighted if there was 3rd way to make Storage Exlorer use SAS.
@jarkko-siitonen What Craig said is right. Storage Explorer can generate user delegation SAS. But it can only do so after you are signed in because it needs your Azure AD credential. Storage Explorer is implemented to prefer user delegation SAS because it is more flexible. If your user can already sign in, get an OAuth token and use it to do data transfer, there is a good chance that you can take advantage of user delegation SAS to achieve easier access control. If you can tell us more about your scenario we may better help you by giving recommendations, for example
@JasonYeMSFT Short description: As a fragment of larger project, we are replacing a legacy sftp server with Azure storage account(s) to enforce Entra ID authentication with MFA and Geoblocking. The service is used to transfer files with sizes in hundreds of GB, thus transfer times can be quite long (more than OAuth token lifetime).
How do your users connect to their storage in Storage Explorer:
What are the operations your users need
How do you distribute permissions to your users
Is the AzCopy team just aware of the problem or is fixing this bug on the roadmap?
@jarkko-siitonen They are actively working on a fix for the issue. They've given us an estimated release date in January.
@JasonYeMSFT
For user with contributor role Storage explorer uses account key-based SAS in uploads and downloads
To my knowledge, assuming they also have permission to create user-delegated SAS, this should not be happening. Any idea why this would?
@jarkko-siitonen As Craig has said, AzCopy is working on a fix and once it's done, direct OAuth access token based transfer should no longer have any issues. However, you might be able to stop relying on account key access today in Storage Explorer.
For your read-only case, I think you should keep giving your users Storage Blob Data Reader role. For your read-write case, have you considered giving your users Storage Blob Data Contributor role instead and turn off the storage account key access for your storage accounts? For users with the Storage Blob Data Contributor role, Storage Explorer can generate a user delegation SAS for them which can have a lifetime much longer than a regular OAuth access token (up to 7 days). It has a maximum lifetime though so it would become an issue if your data transfer takes more time to complete. You can try setting this up for a new test storage account and upload a blob to see how it works. We would like to help you stop retaining storage account key access for your users. If you run into any issues, please let us know and we can help.
Thank you @craxal and @JasonYeMSFT for the bug fix release date indication. It's a lot easier to decide what to do next, when there is a clear understanding, the bug is actively worked on!
Seven days maximum duration for OAuth file transfers should not be an issue for us.
And yes, we would indeed prefer not to give storage account key access to users. However, a new test did not yield any better results for us than it did before. Guest user has Storage Blob Data Contributor role on container level and Reader role for storage account level, with additional custom role containing Generate a user delegation key permission on storage account level. The user can generate User delegated SAS manually, but Storage Explorer upload, download and delete functions all use OAuth authentication - which would be just fine for us, if this token renewal bug didn’t exist.
If this Storage Blob Data Contributor behavior is not the expected/desired outcome with Storage Blob Data Contributor role, maybe I should create a new issue for Storage Explorer team to inspect?
I read the code and I found that our upload code doesn't use user delegation SAS because it doesn't have the limitation of service to service copy where the source and destination cannot use different OAuth credentials. I added this issue to tracking adding this user delegation SAS to our upload/download code https://github.com/microsoft/AzureStorageExplorer/issues/8299
Preflight Checklist
Storage Explorer Version
1.35.0
Regression From
1.19.1
Architecture
i86
Storage Explorer Build Number
No response
Platform
Windows
OS Version
Windows 11
Bug Description
When uploading large files, the file gets "stuck" after partially uploading. Progress never goes beyond a certain point and the file never finishes uploading. In my most recent scenario the file was around 750 GB, but I have seen this sporadically in the past for files as small as 1 GB.
The issue seems likely related to the use of OAuth for authentication, which was added somewhat recently. If I roll back to version 1.19.1, it uses Active Directory authentication, and I don't see the issue.
The repro is 100% with the 750 GB file.
I've attached an AzCopy log file from the failure case 9978b3ad-4aa4-3944-7646-d12ca3c1bcdc.log
Steps to Reproduce
Actual Experience
The upload partially completes then gets "stuck" and never makes any further progress.
Expected Experience
File should upload completely
Additional Context
No response