Describe the bug
There is consistently 1 leaked VM after a transfer is quit.
To Reproduce
Run transfer skyplane cp -r gs://skyplane-big-test-bucket/OPT-cloudflare/ s3://test-us-east-1-7711e4ae/. During dispatch, Ctrl-C exit the transfer.
Transfer client log
Logging to: /tmp/skyplane/transfer_logs/20230623_145734-bd9ae325/client.log
Using Skyplane version 0.3.2
Will transfer objects from gcp:us-central1-a to aws:us-east-1
14:57:36 [WARN] Quota limit file not found for aws:us-east-1. Try running `skyplane init --reinit-aws` to load the quota information
VMs to provision: 1x aws:us-east-1, 1x gcp:us-central1-a
Estimated egress cost: $0.12/GB
gs://skyplane-big-test-bucket/OPT-cloudflare/reshard-model_part-0.pt => s3://test-us-east-1-7711e4ae/reshard-model_part-0.pt
(15.34GB)
gs://skyplane-big-test-bucket/OPT-cloudflare/reshard-model_part-1.pt => s3://test-us-east-1-7711e4ae/reshard-model_part-1.pt
(15.34GB)
gs://skyplane-big-test-bucket/OPT-cloudflare/reshard-model_part-2.pt => s3://test-us-east-1-7711e4ae/reshard-model_part-2.pt
(15.34GB)
gs://skyplane-big-test-bucket/OPT-cloudflare/reshard-model_part-3.pt => s3://test-us-east-1-7711e4ae/reshard-model_part-3.pt
(15.34GB)
gs://skyplane-big-test-bucket/OPT-cloudflare/reshard-model_part-4.pt => s3://test-us-east-1-7711e4ae/reshard-model_part-4.pt
(15.34GB)
...
Transfer starting
14:57:41 [WARN] Quota limit file not found for aws:us-east-1. Try running `skyplane init --reinit-aws` to load the quota information
β Provisioning VMs (2/2) in 37.14s
β Ό Authorizing gateways with firewalls ββββββββββββββββββββββββββββββββββββββββ 0/2 0:00:0114:58:41 [WARN] :us-east-1 Error adding IPs to security group, since it already exits: An error occurred (InvalidPermission.Duplicate)
when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 0.0.0.0/0, ALL, ALLOW" already exists
β Starting gateway container on VMs (2/2) in 28.52s
β Ή Transfer progressaws:us-east-1 βββΈβββββββββββββββββββββββββββββββββββββ 8.6/122.7 GiB 482.5 MB/s 0:04:15^C
Transfer cancelled by user. Copying gateway logs and exiting.
β Transfer progressaws:us-east-1 ββββΈββββββββββββββββββββββββββββββββββββ 11.1/122.7 GiB 473.2 MB/s 0:04:1415:00:00 [ERROR] Error running <lambda>, GCPServer(region_tag=gcp:us-central1-a, instance_name=skyplane-gcp-de24eada): 'NoneType'
object has no attribute 'open_session'
15:00:00 [ERROR] Error running <lambda>, AWSServer(region_tag=aws:us-east-1, instance_id=i-0861627e6ae3b80f1): 'NoneType' object has no
attribute 'open_session'
Exception in thread Thread-35:
Traceback (most recent call last):
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 181, in monitor_single_dst_helper
self.monitor_transfer(dst_region)
File "/Users/sarahwooders/repos/skyplane/skyplane/utils/imports.py", line 33, in wrapped
return fn(*modules_imported, *args, **kwargs)
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 278, in monitor_transfer
do_parallel(lambda i: i.run_command("echo 1"), self.dataplane.bound_nodes.values(), n=8)
File "/Users/sarahwooders/repos/skyplane/skyplane/utils/fn.py", line 57, in do_parallel
args, result = future.result()
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py",
line 451, in result
return self.__get_result()
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py",
line 403, in __get_result
raise self._exception
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py",
line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/sarahwooders/repos/skyplane/skyplane/utils/fn.py", line 43, in wrapped_fn
return args, func(args)
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 278, in <lambda>
do_parallel(lambda i: i.run_command("echo 1"), self.dataplane.bound_nodes.values(), n=8)
File "/Users/sarahwooders/repos/skyplane/skyplane/compute/server.py", line 241, in run_command
_, stdout, stderr = client.exec_command(command)
File "/Users/sarahwooders/repos/skyplane/env/lib/python3.10/site-packages/paramiko/client.py", line 560, in exec_command
chan = self._transport.open_session(timeout=timeout)
AttributeError: 'NoneType' object has no attribute 'open_session'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in
_bootstrap_inner
self.run()
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 216, in run
raise e
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 214, in run
results.append(future.result())
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py",
line 451, in result
return self.__get_result()
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py",
line 403, in __get_result
raise self._exception
File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py",
line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/sarahwooders/repos/skyplane/skyplane/api/tracker.py", line 194, in monitor_single_dst_helper
UsageClient.log_exception(
File "/Users/sarahwooders/repos/skyplane/skyplane/api/usage.py", line 147, in log_exception
stats = client.make_error(
File "/Users/sarahwooders/repos/skyplane/skyplane/api/usage.py", line 304, in make_error
dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
File "/Users/sarahwooders/repos/skyplane/skyplane/api/usage.py", line 304, in <listcomp>
dest_regions = [tag.split(":")[1] for tag in dest_region_tags]
IndexError: list index out of range
β Transfer progressaws:us-east-1 ββββΈββββββββββββββββββββββββββββββββββββ 11.1/122.7 GiB 473.2 MB/s 0:04:14%
Environment info (please complete the following information):
Describe the bug There is consistently 1 leaked VM after a transfer is quit.
To Reproduce Run transfer
skyplane cp -r gs://skyplane-big-test-bucket/OPT-cloudflare/ s3://test-us-east-1-7711e4ae/
. During dispatch, Ctrl-C exit the transfer.Transfer client log
Environment info (please complete the following information):
SKY-270