rfeng2023 / mmcloud

1 stars 10 forks source link

Data Upload Methods from S3 to Synapse #53

Closed rfeng2023 closed 6 months ago

rfeng2023 commented 8 months ago

Currently, when attempting to transfer data from an S3 bucket to Synapse, there are a couple of methods available:

  1. First, download the data to a local machine, then upload it to Synapse using the Command Line Interface (CLI).
  2. Alternatively, launch a Jupyter notebook via Opcenter, employing either a bash notebook or a terminal kernel for the upload process. However, this approach has its drawbacks: the terminal kernel tends to be slow and prone to freezing, and utilizing a bash notebook for large tasks (such as uploading over 30,000 files) could lead to crashes due to interactive messages during the upload process.

There's also the possibility of directly linking the S3 bucket with Synapse by granting it 'ListObject' permissions, as outlined in the synapse documentation). This raises several questions:

gaow commented 8 months ago

@rfeng2023 thanks for finding that solution. I would say we can work on moving data to NIAGADS AWS (free within East-1) then let NIAGADS figure out the ListObject stuff including potential cost and smoothing the process -- just have to work with their team to figure it out.

Ashley-Tung commented 7 months ago

Hi @rfeng2023 , I have reached out to engineering about implementing this as a feature request, as well as feedback on any other workaround we may find

This is for job id fyt5k4sg5948c3s8oh6v9

@gaow do you think this could be solved by moving data to NIAGADS AWS? Or with moving to Columbia AWS per this message: image

Ashley-Tung commented 7 months ago

Additionally, @rfeng2023 does this network graph line up with your usage? Did you test synapse during the time of peaks here: image

gaow commented 7 months ago

do you think this could be solved by moving data to NIAGADS AWS?

@Ashley-Tung this is basically handing the problem to another group to let them take care of it.

We are going to try either implementing the official suggestion from synapse or letting another group solve the issue.