Open hibanacreatives opened 1 year ago
Thanks for reporting this issue @hibanacreatives! Could you also please attach the client.log
file? The file is printed at the start of the transfer like:
Logging to: /tmp/skyplane/transfer_logs/20230313_172455-c9bc9280/client.log
Also, Skyplane download any gateway logs (their format is /tmp/skyplane/transfer_logs/20230313_172455/gateway_aws:us-east-1:i-0e776422b8c43582e.stdout
)?
I think this is a bug in Skyplane so I need to look into it further. A temporary workaround might be to use more VMs - how many did you try?
Thanks for the reply @sarahwooders . Totally meant to upload the log with the report, oops. Here ya go. client.log
No gateway logs found.
Tried up to -n5 but then started hitting some quota limits.
Thanks and let me know if I can help further.
Thanks for the client log! We're working on a potential fix right now so will keep you posted with that.
Hi @hibanacreatives - I'm actually having some trouble reproducing the error. Could you please try upgrading skyplane with pip install --upgrade skyplane
, and then re-run the command with the --debug
flag? That should download the gateway logs, and it would be great if you could share those with me.
@sarahwooders Ohh! Will do. I was on 0.2.1 and I see there's a 0.3.0. I'll give that a go a little bit later tonight and let you know. Thanks for taking the time to poke at it.
I tried again and bumped into a different set of errors. The gateway allocation is timing out. I'm going to try again with a fresh environment and will post some debug info.
Ah ok - please post the client.log files and the gateway logs from --debug
mode when you get the chance!
debug_files.zip Files attached. I noticed a ModuleNotFound error missing 'typer' I confirmed it was in my venv. I saw it in the gateway logs. Does that mean that perhaps the gateway environment is missing that module somehow?
Please let me know what else I can do to help diagnose. Thanks for your time.
Sorry for the delayed response - we just fixed the bug on the gateways. Could you please upgrade Skyplane try again? Really appreciate your help with debugging this issue.
@hibanacreatives were you able to resolve this issue?
Hi! Sorry for the delayed response. I worked around my issue, but very happy to help debug. It still didn't work iirc. I'll have some time to generate debug info this weekend.
Thanks for the ping
@hibanacreatives yes would really appreciate getting some of the logs so we can fix this for future users!
Describe the bug Did my best to check the issues and docs but didn't see this come up.
I'm transferring a single large (~2TB) file from GCP to AWS. It gets about 1% done before it stops, deprovisions, and displays the following errors:
The receiver instance seems to be running out of storage space. It's the default m5.8xlarge image. I'm not sure how much storage space is allocated and didn't see it as a configuration option. This tracks with trying higher -n settings seem to get farther along, as the design should split the capacity across the instances.
Multipart was enabled during the runs.
Any best practices when handling single large files as I suspect that's near the root of the problem.
Thanks for the help and really cool project!
To Reproduce Steps to reproduce the behavior (please include the full Skyplane command you ran):
Expected behavior The transfer should complete without errors and my file should show up in aws s3
Transfer client log In the log output from Skyplane, please upload the debug log from the CLI. You can find the path to the file in the log output:
Environment info (please complete the following information):
Additional context I've done attempts with varying number of instances but had the same result.
Skyplane Config:
I'm thinking of maybe trying a storage focused image instead of the default.