neuro-inc / neuro-cli

Platform-specific API and CLI python client
https://neu-ro.gitbook.io/neu-ro-cli-reference/
Other
18 stars 7 forks source link

Implement `neuro storage load` and `neuro storage sync` #895

Closed dalazx closed 5 years ago

dalazx commented 5 years ago

neuro storage load SOURCE DESTINATION unconditionally copies the source directory contents to the specified destination directory. neuro storage sync SOURCE DESTINATION copies only the file difference between the source and the destination directories. These commands should not support storage: to storage: and file: to :file source and destination pairs.

Both commands should implement the following strategy:

  1. Run a one-off job which exposes a temporary s3 gateway;
    neuro submit -c 1 -m 1G --http 9000 --no-http-auth -n neuro-upload -P -e MINIO_ACCESS_KEY=access_key -e MINIO_SECRET_KEY=secret_key --volume storage::/mnt:rw minio/minio server /mnt

    access_key and secret_key can be equal and should be generated in run time. uuid4 should be enough I guess. the storage URI in the volume should be equal to the one set in the source or destination arguments.

Inside the upload job we need to create a symlink in /mnt

cd /mnt
ln -s /mnt storage

because Minio requires an existing bucket and does not support cross-device mounts:

2019/07/10 08:41:10 Cross-device mounts detected on path (/mnt) at following locations [/mnt/storage]. Export path should not have any sub-mounts, refusing to start.
  1. Wait until the job is running;

  2. Run a docker container locally:

    docker run --rm -it -e AWS_ACCESS_KEY_ID=access_key -e AWS_SECRET_ACCESS_KEY=secret_key -v $(pwd):/data --entrypoint sh mesosphere/aws-cli

    access_key and secret_key should be the ones generated previously. The volume source path should be equal to the one specified in the source or destination arguments.

  3. Depending on the neuro command, run the following commands inside the container: 4.1. for neuro storage load SOURCE DESTINATION, run:

    aws configure set default.s3.max_concurrent_requests 100
    aws configure set default.s3.max_queue_size 10000
    aws --endpoint-url https://test-upload--dalazx.jobs-dev.neu.ro s3 cp --recursive /data s3://storage

    4.2. for neuro storage sync SOURCE DESTINATION, run:

    aws configure set default.s3.max_concurrent_requests 100
    aws configure set default.s3.max_queue_size 10000
    aws --endpoint-url https://neuro-upload--dalazx.jobs-dev.neu.ro s3 sync /data s3://storage
  4. Wait until the container finishes;

  5. Remove the symlink in the upload container;

  6. Kill the upload job.

Questions

  1. Should we generate a unique symlink name? Should it start with a dot?
  2. Should the upload job be unnamed?
  3. Should we implement passing an endpoint for a job's container? Yes. See https://github.com/neuromation/platform-api/issues/802 https://github.com/neuromation/platform-client-python/issues/896 Meanwhile we can get along with exec until these are implemented.
  4. What are the meaningful amounts of the CPU and memory resources for the upload job?
serhiy-storchaka commented 5 years ago
2\. Should the upload job be unnamed?

At least at the time of developing it helped that the job has constant name.

Even after making the job name containing a variable part, the constant part will be descriptive. It will help to distinguish such jobs from jobs ran by the user.