skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.09k stars 62 forks source link

[UX] Need more clarity around recursive transfers #484

Closed parasj closed 2 years ago

parasj commented 2 years ago

The UX of recursive transfers is confusing:

parasj commented 2 years ago

@abiswal2001 I was working on setting up Skyplane with @romilbhardwaj and we found a few places where recursive transfers were confusing to him. We match the semantics of aws s3 cp exactly at the moment (e.g. non-recursive transfers will only copy one object, recursive transfers require a trailing slash on the source path). We found some edge cases listed above so we may want to add better documentation or better error messages.

parasj commented 2 years ago

One example of an error with recursive transfers:

$ skyplane sync --recursive s3://romil-sky-test s3://romil-dataset
 _____ _   ____   _______ _       ___   _   _  _____
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __|
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/

Traceback (most recent call last):
  File "/mnt/d/wsl/anaconda3/lib/python3.7/site-packages/skyplane/cli/cli.py", line 250, in sync
    src_region, bucket_src, path_src, dst_region, bucket_dst, path_dst, recursive=recursive
  File "/mnt/d/wsl/anaconda3/lib/python3.7/site-packages/skyplane/cli/cli_impl/cp_replicate.py", line
142, in generate_full_transferobjlist
    dest_key = map_object_key_prefix(source_prefix, source_obj.key, dest_prefix, recursive=recursive)
  File "/mnt/d/wsl/anaconda3/lib/python3.7/site-packages/skyplane/cli/cli_impl/cp_replicate.py", line
106, in map_object_key_prefix
    raise exceptions.MissingObjectException(f"Source key {source_key} does not start with source prefix {source_prefix}")
skyplane.exceptions.MissingObjectException: Source key sky.prof does not start with source prefix /

❌ MissingObjectException: Source key sky.prof does not start with source prefix /
Please ensure that the object exists and is accessible.

In this case, we can provide better error logging to directly suggest running the command as: skyplane sync --recursive s3://romil-sky-test/ s3://romil-dataset. Alternatively can we assume a slash?

parasj commented 2 years ago

One place this is perhaps confusing is how AWS s3 cp treats prefixes versus paths. From the AWS s3 cp documentation:

Recursively copying S3 objects to a local directory

When passed with the parameter --recursive, the following cp command recursively copies all objects under a specified prefix and bucket to a specified directory. In this example, the bucket mybucket has the objects test1.txt and test2.txt:

aws s3 cp s3://mybucket . --recursive
Output:

download: s3://mybucket/test1.txt to test1.txt
download: s3://mybucket/test2.txt to test2.txt