skyplane-project / skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥
https://skyplane.org
Apache License 2.0
1.08k stars 62 forks source link

[bug] missing network resource when copying #715

Open paymog opened 1 year ago

paymog commented 1 year ago

Describe the bug Trying to copy from gcs to s3 failed.

To Reproduce

  1. Install and configure skyplane, gcloud and aws
  2. I ran the following command: skyplane cp gs://firehose/streamingfast/eth/mainnet/merged/0015000000.dbin.zst s3://firehose-mainnet-prod/merged-blocks

This was the output:

 ❯❯❯ skyplane cp gs://firehose/streamingfast/eth/mainnet/merged/0015000000.dbin.zst s3://firehose-mainnet-prod/merged-blocks
 _____ _   ____   _______ _       ___   _   _  _____
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __|
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/

Will transfer 1 objects totaling 21.48MB from gcp:us-west1-b to aws:us-west-2
    VMs to provision: 1x aws:us-west-2, 1x gcp:us-west1-b
    Estimated egress cost: $0.00 at $0.12/GB
    streamingfast/eth/mainnet/merged/0015000000.dbin.zst => merged-blocks
Continue? [Y/n]: y
Transfer starting (Tip: Enable auto-confirmation with `skyplane config set autoconfirm true`)

Storing debug information for transfer in /tmp/skyplane/transfer_logs/20221207_163200/client.log
⠏ Initializing cloud keys ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/6 0:00:0216:32:03 [ERROR] Error running <lambda>: <HttpError 404 when requesting https://compute.googleapis.com/compute/v1/projects/goldsky-prod-356310/global/firewalls?alt=json returned "The resource
'projects/goldsky-prod-356310/global/networks/skyplane' was not found". Details: "[{'message': "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found", 'domain': 'global', 'reason': 'notFound'}]">
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/cli/cli_impl/cp_replicate.py", line 306, in launch_replication_job
    rc.provision_gateways(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/replicate/replicator_client.py", line 195, in provision_gateways
    do_parallel(lambda fn: fn(), jobs, spinner=True, spinner_persist=True, desc="Initializing cloud keys")
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/utils/fn.py", line 57, in do_parallel
    args, result = future.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/utils/fn.py", line 46, in wrapped_fn
    raise e
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/utils/fn.py", line 43, in wrapped_fn
    return args, func(args)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/replicate/replicator_client.py", line 195, in <lambda>
    do_parallel(lambda fn: fn(), jobs, spinner=True, spinner_persist=True, desc="Initializing cloud keys")
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/utils/imports.py", line 33, in wrapped
    return fn(*modules_imported, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/compute/gcp/gcp_cloud_provider.py", line 204, in configure_skyplane_firewall
    create_firewall(fw_body, update_firewall=False)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/skyplane/compute/gcp/gcp_cloud_provider.py", line 185, in create_firewall
    op = compute.firewalls().insert(project=self.auth.project_id, body=fw_body).execute()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/googleapiclient/http.py", line 938, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://compute.googleapis.com/compute/v1/projects/goldsky-prod-356310/global/firewalls?alt=json returned "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was
not found". Details: "[{'message': "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found", 'domain': 'global', 'reason': 'notFound'}]">

<HttpError 404 when requesting https://compute.googleapis.com/compute/v1/projects/goldsky-prod-356310/global/firewalls?alt=json returned "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found". Details: "[{'message':
"The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found", 'domain': 'global', 'reason': 'notFound'}]">
16:32:31 [WARN]  [AWS] Error removing IPs from security group: An error occurred (MissingParameter) when calling the RevokeSecurityGroupIngress operation: Either 'ipPermissions' or 'securityGroupRuleIds' should be provided.

Expected behavior I expect this command to succeed.

Transfer client log

 ❯❯❯ cat /tmp/skyplane/transfer_logs/20221207_163200/client.log
16:32:02 [DEBUG] [AWS] Creating keypair skyplane-us-west-2 in us-west-2
16:32:03 [ERROR] Error running <lambda>: <HttpError 404 when requesting https://compute.googleapis.com/compute/v1/projects/goldsky-prod-356310/global/firewalls?alt=json returned "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found". Details: "[{'message': "The resource 'projects/goldsky-prod-356310/global/networks/skyplane' was not found", 'domain': 'global', 'reason': 'notFound'}]">
16:32:04 [INFO]  Created key file /home/ubuntu/.skyplane/keys/aws/skyplane-us-west-2.pem
16:32:13 [DEBUG] [AWS] Authorizing 0.0.0.0/0:22 in skyplane
16:32:29 [WARN]  Deprovisioning gateways then exiting. Please wait...
16:32:31 [DEBUG] [AWS] Removing IPs [] from security group skyplane
16:32:31 [ERROR] [AWS] Error removing IPs [] from security group skyplane: An error occurred (MissingParameter) when calling the RevokeSecurityGroupIngress operation: Either 'ipPermissions' or 'securityGroupRuleIds' should be provided.
16:32:31 [WARN]  [AWS] Error removing IPs from security group: An error occurred (MissingParameter) when calling the RevokeSecurityGroupIngress operation: Either 'ipPermissions' or 'securityGroupRuleIds' should be provided.
16:32:31 [WARN]  Deprovisioning 0 instances
16:32:31 [INFO]  Deprovisioned instances

Environment info (please complete the following information):

parasj commented 1 year ago

@paymog It seems like the skyplane network wasn't created correctly. @abiswal2001 will work on a PR to automatically detect and recreate the network if it doesn't exist.

As a workaround, manually create the skyplane VPC network:

  1. Go to https://console.cloud.google.com/networking/networks/list, select your project then click "Create a VPC Network"
  2. Set it up as follows: a. Name: skyplane b. Subnet creation mode: Automatic c. Dynamic routing mode: Leave as Regional d. MTU: Leave as 1460
paymog commented 1 year ago

sweet! I think I can actually solve my needs with gsutil so I probably won't get around to that but if gsutil starts failing I'll give this a shot 🙂