jhu-sheridan-libraries / jhuda-general-issues

0 stars 0 forks source link

Investigate and propose approach to transfer files from Azure File Service mount to the final archival location #15

Closed htpvu closed 4 years ago

htpvu commented 5 years ago

The output of this ticket maybe of particular interest to @emetsger (current owner of Deposit Service)

jgara commented 5 years ago

Assuming:

  1. VSM is on a Hopkins private network
  2. Azure is on the public internet

connections initiated from Azure to VSM are not possible.

One possible approach:

An "Azure Download Service" (ADS) runs on the VSM system. The ADS polls an Azure queue for packages to download. If a package is available, the ADS can obtain from the queue (or some secure place accessible only to the ADS):

  1. Azure storage account name (e.g. "derekazurestorage")
  2. Azure Files "share name" (e.g. "uploadtest")
  3. File path (e.g. "jeff/azcopy_test")
  4. Package or File name(s)
  5. SAS token a. Example of obtaining SAS token programmatically in Java.

Using the "azcopy" CLI utility and the above information, the ADS can download the package from Azure Files.

Performance of azcopy:

In my tests using a large (100GB) file, upload (VSM -> Azure) performance was > 100MB/s. Download (Azure -> VSM) speed was > 500MB/s.

Alternative technologies

  1. Mounting the Azure Files share using SMB. In my testing, observed: a. slower speeds (typically 25MB/s) b. mount hangs c. instability: at one point, something went wrong and some files were copied to the SMB mount in an incomplete state. Waiting 12 hours did not clean them up. The only solution I found was running a PowerShell command (Close-AzStorageFileHandle).
htpvu commented 5 years ago

@jgara could you please put your findings into this document https://docs.google.com/document/d/1aXzISpcD4M-1GRIyBVW9-RBreWxWFO1P0f5RqmEHaR4/edit ?

Once it's there, we can ask the other system engineers to review and raise any concerns. If all is good, then we'll document the decision and this document could serve as documentation for the future.

Let me know if you have questions.

htpvu commented 4 years ago

Done and parked.