lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

Runtime upload to VPC does not occur #859

Closed sergii-mamedov closed 2 years ago

sergii-mamedov commented 2 years ago

After upgrading to version 2.5.7, I found that the latest version of lithops is not uploaded to the VPC (the old version remains). I tried deleting the VPC to create a new one. In this case, the lithops process does not start on the VPC at all. I tried the previous version (2.5.6) - same result. The last version we used that didn't have this problem was 2.5.0. It seems that the reason is the recent changes to this function. We are using consume mode for VPC. Stacktrace on our side:

2021-11-23 11:36:25,763 - INFO - lithops.config[Thread-1] - config.py:131 - Lithops v2.5.7
................................
2021-11-23 11:36:32,054 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:205 - executor.map(_load_ds, 1 items, 32768MB, attempt 1)
2021-11-23 11:36:32,054 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:331 - Selected executor ibm_vpc
2021-11-23 11:36:32,061 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:112 - ExecutorID 317913-1 | JobID M000 - Selected Runtime: metaspace2020/metaspace-lithops:1.9.3 
2021-11-23 11:36:32,408 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:120 - Runtime metaspace2020/metaspace-lithops:1.9.3 is not yet installed
2021-11-23 11:39:35,209 - ERROR - engine.lithops-wrapper[Thread-1] - executor.py:263 - _load_ds raised an exception. Failed activation(s): [] ID(s): []
NoneType: None
2021-11-23 11:39:35,214 - ERROR - lithops-daemon[Thread-1] - queue.py:432 -  [x] Failed: {"ds_id": "2021-08-27_16h26m18s", "ds_name": "dataset_05", "action": "annotate", "del_first": true, "force": true, "email": "sergii.mamedov@embl.de"}
2021-11-23 11:39:35,215 - WARNING - lithops-daemon[Thread-1] - lithops.py:51 - Lithops annotation failed, retrying.
Traceback (most recent call last):
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/lithops.py", line 81, in _callback
    self._manager.annotate_lithops(ds=ds, del_first=msg.get('del_first', False))
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/dataset_manager.py", line 114, in annotate_lithops
    ServerAnnotationJob(executor, ds, perf).run()
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/annotation_job.py", line 322, in run
    self.results_dfs, self.png_cobjs = self.pipe(**kwargs)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 90, in __call__
    self.load_ds(use_cache=use_cache)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/cache.py", line 81, in wrapper
    return f(self, *args, **kwargs)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 131, in load_ds
    ) = load_ds(self.executor, self.imzml_cobject, self.ibd_cobject, self.ds_segm_size_mb)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/load_ds.py", line 141, in load_ds
    imzml_reader, ds_segments_bounds, ds_segms_cobjs, ds_segm_lens = executor.call(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 364, in call
    return self.map(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 269, in map
    raise exc
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 302, in run
    futures = executor.map(
TimeoutError: Readiness probe expired on VM instance metaspace-staging-vpc-128 (161.156.169.205)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/queue.py", line 430, in get_message
    self._callback(msg)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/lithops.py", line 104, in _callback
    raise AnnotationError(ds_id=msg['ds_id'], traceback=format_exc(chain=False)) from e
sm.engine.errors.AnnotationError: Annotation failed (ds_id=2021-08-27_16h26m18s)

@JosepSampe can you look on it?

JosepSampe commented 2 years ago

I ran some tests with Lithops from scratch and for now I'm unable to replicate the issue.

TimeoutError: Readiness probe expired on VM instance metaspace-staging-vpc-128 (161.156.169.205)

Initially, lithops tries to create an ssh connection to the VM and then it proceeds to install all the necessary to it. This message means lithops is unable to ssh to the VM, I think the code raises the exception on this line, so it is normal that it never installs the service there.

Could you verify that the VM is started? and the ssh key is correct?

We recently added a new logic that verifies the ssh keys in IBM VPC are correct: https://github.com/lithops-cloud/lithops/blob/master/lithops/standalone/backends/ibm_vpc/ibm_vpc.py#L559-L577. Might it fail here?

Could you run a simple test by changing this line and add self.backend.master.wait_ready(verbose=True)?

Also, In your logs lithops seems to be set to INFO logging, could you set it to DEBUG?

sergii-mamedov commented 2 years ago

We did not change the ssh key and I tested the ability to manually connect via ssh. I enable verbose=True and DEBUG level for logging. How else can I figure out what the problem is?

2021-11-25 11:17:15,216 - INFO - lithops.config[Thread-1] - config.py:131 - Lithops v2.5.7
2021-11-25 11:17:15,216 - DEBUG - lithops.config[Thread-1] - config.py:212 - Loading Standalone backend module: ibm_vpc
2021-11-25 11:17:15,216 - DEBUG - lithops.config[Thread-1] - config.py:254 - Loading Storage backend module: ibm_cos
2021-11-25 11:17:15,216 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:37 - Creating IBM COS client
2021-11-25 11:17:15,217 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:57 - Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2021-11-25 11:17:15,217 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:60 - Using access_key and secret_key
2021-11-25 11:17:15,222 - INFO - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:97 - IBM COS client created - Region: eu-de
2021-11-25 11:17:15,222 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1] - ibm_vpc.py:46 - Creating IBM VPC client
2021-11-25 11:17:15,222 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1] - ibm_vpc.py:56 - Setting VPC endpoint to: https://eu-de.iaas.cloud.ibm.com
2021-11-25 11:17:15,223 - INFO - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1] - ibm_vpc.py:75 - IBM VPC client created - Region: eu-de
2021-11-25 11:17:15,224 - DEBUG - lithops.standalone.standalone[Thread-1] - standalone.py:60 - Standalone handler created successfully
2021-11-25 11:17:15,224 - DEBUG - lithops.invokers[Thread-1] - invokers.py:92 - ExecutorID aecfec-3 - Invoker initialized. Max workers: 1
2021-11-25 11:17:15,225 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1] - ibm_vpc.py:250 - Initializing IBM VPC backend (consume mode)
2021-11-25 11:17:15,225 - DEBUG - lithops.executors[Thread-1] - executors.py:162 - Function executor for ibm_vpc created with ID: aecfec-3
2021-11-25 11:17:15,225 - DEBUG - lithops.config[Thread-1] - config.py:254 - Loading Storage backend module: ibm_cos
2021-11-25 11:17:15,225 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:37 - Creating IBM COS client
2021-11-25 11:17:15,225 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:57 - Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2021-11-25 11:17:15,225 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:60 - Using access_key and secret_key
2021-11-25 11:17:15,229 - INFO - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:97 - IBM COS client created - Region: eu-de
2021-11-25 11:17:15,230 - DEBUG - lithops.config[Thread-1] - config.py:254 - Loading Storage backend module: ibm_cos
2021-11-25 11:17:15,230 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:37 - Creating IBM COS client
2021-11-25 11:17:15,230 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:57 - Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2021-11-25 11:17:15,230 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:60 - Using access_key and secret_key
2021-11-25 11:17:15,233 - INFO - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1] - ibm_cos.py:97 - IBM COS client created - Region: eu-de
2021-11-25 11:17:15,968 - INFO - engine[Thread-1] - job.py:67 - Storing job metadata
2021-11-25 11:17:16,583 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:205 - executor.map(_load_ds, 1 items, 32768MB, attempt 1)
2021-11-25 11:17:16,584 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:331 - Selected executor ibm_vpc
2021-11-25 11:17:16,590 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:112 - ExecutorID aecfec-3 | JobID M000 - Selected Runtime: metaspace2020/metaspace-lithops:1.9.3 
2021-11-25 11:17:16,591 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:397 - Runtime metadata not found in local cache. Retrieving it from storage
2021-11-25 11:17:16,591 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:400 - Trying to download runtime metadata from: ibm_cos://metaspace-env-staging-temp/lithops.runtimes/2.5.7/ibm_vpc/02c7_ad2243d5-20c6-4d41-bc24-cf0e316dea32/metaspace2020-metaspace-lithops-1.9.3.meta.json
2021-11-25 11:17:16,809 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:419 - Runtime metadata not found in storage
2021-11-25 11:17:16,809 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:120 - Runtime metaspace2020/metaspace-lithops:1.9.3 is not yet installed
2021-11-25 11:17:16,809 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:337 - Checking if VM instance metaspace-staging-vpc-128 (161.156.169.205) is ready
2021-11-25 11:17:18,209 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:773 - Starting VM instance metaspace-staging-vpc-128
2021-11-25 11:17:19,440 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:783 - VM instance metaspace-staging-vpc-128 started successfully
2021-11-25 11:17:19,440 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:614 - Waiting VM instance metaspace-staging-vpc-128 (161.156.169.205) to become ready
JosepSampe commented 2 years ago

Do not appear any other line after the last line?

Try removing all this block: https://github.com/lithops-cloud/lithops/blob/master/lithops/standalone/backends/ibm_vpc/ibm_vpc.py#L559-L577

sergii-mamedov commented 2 years ago

Yes, this is the last line from the lithops. Next comes an exception from our code, as in the example above.

sergii-mamedov commented 2 years ago

Try removing all this block: https://github.com/lithops-cloud/lithops/blob/master/lithops/standalone/backends/ibm_vpc/ibm_vpc.py#L559-L577

In this case everything is fine:

2021-11-25 11:34:29,889 - INFO - engine[Thread-1] - job.py:67 - Storing job metadata
2021-11-25 11:34:30,574 - INFO - engine.lithops-wrapper[Thread-1] - executor.py:205 - executor.map(_load_ds, 1 items, 32768MB, attempt 1)
2021-11-25 11:34:30,574 - DEBUG - engine.lithops-wrapper[Thread-1] - executor.py:331 - Selected executor ibm_vpc
2021-11-25 11:34:30,581 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:112 - ExecutorID aad558-1 | JobID M000 - Selected Runtime: metaspace2020/metaspace-lithops:1.9.3 
2021-11-25 11:34:30,581 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:397 - Runtime metadata not found in local cache. Retrieving it from storage
2021-11-25 11:34:30,581 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:400 - Trying to download runtime metadata from: ibm_cos://metaspace-env-staging-temp/lithops.runtimes/2.5.7/ibm_vpc/02c7_ad2243d5-20c6-4d41-bc24-cf0e316dea32/metaspace2020-metaspace-lithops-1.9.3.meta.json
2021-11-25 11:34:30,876 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:419 - Runtime metadata not found in storage
2021-11-25 11:34:30,876 - INFO - lithops.invokers[Thread-1-ex] - invokers.py:120 - Runtime metaspace2020/metaspace-lithops:1.9.3 is not yet installed
2021-11-25 11:34:30,876 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:337 - Checking if VM instance metaspace-staging-vpc-128 (161.156.169.205) is ready
2021-11-25 11:34:31,163 - DEBUG - lithops.util.ssh_client[Thread-1-ex] - ssh_client.py:37 - 161.156.169.205 ssh client created
2021-11-25 11:34:31,632 - INFO - lithops.standalone.standalone[Thread-1-ex] - standalone.py:414 - Installing Lithops in VM instance metaspace-staging-vpc-128 (161.156.169.205)
2021-11-25 11:34:31,633 - DEBUG - lithops.utils[Thread-1-ex] - utils.py:227 - Creating function handler zip in /opt/dev/metaspace/metaspace/engine/lithops_standalone.zip
2021-11-25 11:34:31,682 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:421 - Uploading lithops files to VM instance metaspace-staging-vpc-128 (161.156.169.205)
2021-11-25 11:34:32,075 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:431 - Executing lithops installation process on VM instance metaspace-staging-vpc-128 (161.156.169.205)
2021-11-25 11:34:32,075 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:432 - Be patient, initial installation process may take up to 3 minutes
2021-11-25 11:34:34,530 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:101 - Validating lithops version installed on master matches 2.5.7
2021-11-25 11:34:34,643 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:123 - Validating lithops lithops master service is running on VM instance metaspace-staging-vpc-128 (161.156.169.205)
2021-11-25 11:34:34,807 - INFO - lithops.standalone.standalone[Thread-1-ex] - standalone.py:140 - Waiting Lithops service to become ready on VM instance metaspace-staging-vpc-128 (161.156.169.205)
2021-11-25 11:34:34,981 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:146 - VM instance metaspace-staging-vpc-128 (161.156.169.205) ready in 0.17 seconds
2021-11-25 11:34:34,981 - DEBUG - lithops.standalone.standalone[Thread-1-ex] - standalone.py:345 - Extracting runtime metadata information
2021-11-25 11:34:50,538 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:430 - Uploading runtime metadata to: ibm_cos://metaspace-env-staging-temp/lithops.runtimes/2.5.7/ibm_vpc/02c7_ad2243d5-20c6-4d41-bc24-cf0e316dea32/metaspace2020-metaspace-lithops-1.9.3.meta.json
2021-11-25 11:34:50,903 - DEBUG - lithops.storage.backends.ibm_cos.ibm_cos[Thread-1-ex] - ibm_cos.py:121 - PUT Object lithops.runtimes/2.5.7/ibm_vpc/02c7_ad2243d5-20c6-4d41-bc24-cf0e316dea32/metaspace2020-metaspace-lithops-1.9.3.meta.json - Size: 6.6KiB - OK
2021-11-25 11:34:50,904 - DEBUG - lithops.storage.storage[Thread-1-ex] - storage.py:436 - Storing runtime metadata into local cache: /home/ubuntu/.lithops/cache/lithops.runtimes/2.5.7/ibm_vpc/02c7_ad2243d5-20c6-4d41-bc24-cf0e316dea32/metaspace2020-metaspace-lithops-1.9.3.meta.json
gilv commented 2 years ago

@kpavel @JosepSampe do you know what happens?

JosepSampe commented 2 years ago

@gilv This logic is wrong and raises an exception even if the ssh keys are correct. This was not added by me, so now I'm not sure why it fails. For now I disabled it in #860 until we found an alternative.

kpavel commented 2 years ago

@sergii-mamedov What operating system do you have? Can you please check if you have ssh-keygen available there?

sergii-mamedov commented 2 years ago

@kpavel

# Ubuntu 16.04.6 
$ uname -a
Linux 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux LTS
$ which ssh-keygen 
/usr/bin/ssh-keygen

FYI: We are using non default ssh key: https://github.com/metaspace2020/metaspace/blob/master/metaspace/engine/conf/config.json.template#L86-L90

kpavel commented 2 years ago

@sergii-mamedov The code @JosepSampe removed is very important. It actually came to solve the issue when user has wrong ssh keypair configuration, validating that private/public keys are real pair and if not -> raise error for user with self explaining error message. In case it been failing in that section i would really prefer to solve it, but i may need your help as i can't reproduce it in my environment. If you run ssh-keygen -y -f /home/ubuntu/.ssh/ibm_cloud_vpc, do you get output identical to your public key?

sergii-mamedov commented 2 years ago

If you run ssh-keygen -y -f /home/ubuntu/.ssh/ibm_cloud_vpc, do you get output identical to your public key?

Yep, except comment after == on the end of the line (in you case)

kpavel commented 2 years ago

@sergii-mamedov Ok... I think i may found why the original error wasn't raised. Can I ask you to modify original release code, that has the failing validation section removed by Josep? Just add those 2 lines to let original exception raise with original error message:

https://github.com/lithops-cloud/lithops/pull/862/files

JosepSampe commented 2 years ago

Just as a suggestion: Lithops already shows this message when the private key is not valid:

2021-11-25 23:59:35,670 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2 -- SSH to 172.31.25.190 failed (publickey): Authentication failed.

I wonder: If instead of relying on platform specific logic to verify the ssh key, which has much more overhead, wouldn't doing this be enough and compatible for all backends ?

sergii-mamedov commented 2 years ago

@kpavel

lithops.standalone.standalone.LithopsValidationError: Private ssh key /home/ubuntu/.ssh/ibm_cloud_vpc and public key first-id-rsa on master VM instance metaspace-staging-vpc-128 (161.156.169.205) are not a pair

We have a few public keys on VPC.

UPD: I looked on VPC overview on website, we have the next order of ssh keys:

Provisioned SSH keys
* first-id-rsa
* second-id-rsa
* ibm-cloud-vpc-staging

I think this is the problem.

JosepSampe commented 2 years ago

@sergii-mamedov Is it possible you have multiple private ssh keys in the /home/ubuntu/.ssh dir? If so, maybe this one (/home/ubuntu/.ssh/ibm_cloud_vpc) is not valid, and another one in the /home/ubuntu/.ssh dir is the valid.

UPD: If you try to login to the VM with:

ssh -i /home/ubuntu/.ssh/ibm_cloud_vpc -o IdentitiesOnly=yes -F /dev/null root@161.156.169.205

does it work?

sergii-mamedov commented 2 years ago
ssh -i /home/ubuntu/.ssh/ibm_cloud_vpc -o IdentitiesOnly=yes -F /dev/null root@161.156.169.205

yes, it is works

On VPC we have 3 public key. On our machine, where lithpos starts, we have two privat keys: ibm_cloud_vpc and another one.

kpavel commented 2 years ago

@sergii-mamedov If you run: ssh-keygen -y -f /home/ubuntu/.ssh/ibm_cloud_vpc | cut -d' ' -f 2 do you have as output your ssh public key?

sergii-mamedov commented 2 years ago

@kpavel yep

kpavel commented 2 years ago

@sergii-mamedov Can i ask you modify one line of code to have thrown exception contain also contents of the compared public keys?

                raise LithopsValidationError(
                    f"Private ssh key {key_filename} and public key "
                    f"{key_name} on master {self} are not a pair because {public_res} not equal to {private_res}. "
                    f"the public is from {self.ibm_vpc_client.get_key(key_id).get_result()['public_key']}")

https://github.com/lithops-cloud/lithops/blob/master/lithops/standalone/backends/ibm_vpc/ibm_vpc.py#L572

sergii-mamedov commented 2 years ago

I cut ssh key

lithops.standalone.standalone.LithopsValidationError: Private ssh key /home/ubuntu/.ssh/ibm_cloud_vpc and public key first-id-rsa on master VM instance metaspace-staging-vpc-128 (161.156.169.205) are not a pair because AAAAB3Nza......Ni37PW8HrtIw== not equal to /bin/sh: 1: ssh-keygen: not found
/bin/sh: 1: cut: not found. the public is from ssh-rsa AAAAB3Nz...Ni37PW8HrtIw== email@site.com

it looks like lithops tries the private key with the first public key on the master VM and immediately gets an exception

Some strange about ssh-keygen, because:

$ which ssh-keygen 
/usr/bin/ssh-keygen
kpavel commented 2 years ago

@sergii-mamedov I don't think that VPC allows you to have more than a single public key per VM instance. But the issue i see is that both ssh-keygen and cut are missing from /bin/sh shell Can if you run to get to sh shell: sh

And then try to run from that shell cut or ssh-keygen?

sergii-mamedov commented 2 years ago
ubuntu@ip-172-31-35-172:~$ which sh
/bin/sh
ubuntu@ip-172-31-35-172:~$ /bin/sh
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ubuntu/.ssh/id_rsa): ^C
$ cut
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help' for more information.
$ 
sergii-mamedov commented 2 years ago

I tried on machine, where we launch lihtops:

$ ipython
Python 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 19:31:52) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import subprocess

In [3]: subprocess.getoutput([f'ssh-keygen -y -f /home/ubuntu/.ssh/ibm_cloud_vpc | cut -d " " -f 2'])
Out[3]: 'AAAAB3NzaC1yc2EAAAADAQABAAACAQChj/kuF457SkwpU6ZE/cRInuS0LheGPctRt4Esa5PZ9PXHfdpVO0eYrtS3r9s875Bg6xAn94xqfBmyynCmAOjdjzFmz7p04kOhqMHAwn58812VLJyOnG7RB1on5GnahUqNhHs1RaE8EhXvfJXFqfrhc7a2QbftXxgfXz/Yk3Wrr1tw/d5vhC0RYCl8qKyjABJKgkmX82K/Fr9Uer1v/6XUnOTaAo2WN/ElQRRp916ej3fGCuFXYfKeUNrBl0C1bsQ1BAtkLRvbeA6MpYeUeksW/fdfOKY7RPJ0OGeqpaFYUWz+qWQvOk33Hqhu6L5TRVm1lSV9+2iZAhM3eEDLUvbK+yfNCMhJ2o9K/0gX3yVwnmP4rmXjN15YudZCego/ZuBcwznaXRTSbSFKzD/fah65+YETxxNn99mf2m1USX4frsGO2hWrW4OtLlh5DlXjWnBikB0Stwe2bRjlEVyYyRyzBlOrmu3UWyNsWOjDs5mMyOgmS++Q/9LAdXO/XBuJRZ5K/5MIFaBgn4gmdTmV1GLL4BSfX9HC/fByQiwx0RIU5CD3igmaYcJwm88g0wYmghqTLUJbDOdARvS2EhTgKE3tnHCYpSFXZ3iXBQGgsuI1EU21bSp0BwkZG8E+VILpKzp0igC4996+Zbln9rykUjVmCt7oa/zl1I6tDIR3/Zv73Q=='
kpavel commented 2 years ago

@sergii-mamedov Have another idea. I suspect that something could override the PATH at some point. Can you please try to specify full paths to the binaries?

Please replace the line below with: private_res = subprocess.getoutput([f"/usr/bin/ssh-keygen -y -f {key_filename} | /usr/bin/cut -d' ' -f 2"])

https://github.com/lithops-cloud/lithops/blob/master/lithops/standalone/backends/ibm_vpc/ibm_vpc.py#L569

kpavel commented 2 years ago

@sergii-mamedov The issue should be resolved by PR #865

sergii-mamedov commented 2 years ago

@kpavel I tried the master branch. I see the same problem. I tried deleting all public keys (~/.ssh/authorized_keys) except the one using lithops - same error. Apparently, lithops takes information and public keys not from this file, but directly from the VM metadata.

sergii-mamedov commented 2 years ago

I will try to recreate the VM with one key, then add the other two to the ~/.ssh/authorized_key file and test this hypothesis. Will it help you?

sergii-mamedov commented 2 years ago

I will try to recreate the VM with one key, then add the other two to the ~/.ssh/authorized_key file and test this hypothesis.

In this case, everything works without problems. Apparently the problem is on the side of IBM. @kpavel

kpavel commented 2 years ago

@sergii-mamedov Thanks for the update. Working on a fix.

kpavel commented 2 years ago

@sergii-mamedov Added support for multiple public keys validation in #867

kpavel commented 2 years ago

@sergii-mamedov Should be fixed in master now

sergii-mamedov commented 2 years ago

@kpavel It is sucked here:

2021-11-30 13:16:20,646 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:619 - SSH to None failed (publickey): 'key_id'
2021-11-30 13:16:25,907 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:619 - SSH to None failed (publickey): 'key_id'
2021-11-30 13:16:31,076 - DEBUG - lithops.standalone.backends.ibm_vpc.ibm_vpc[Thread-1-ex] - ibm_vpc.py:619 - SSH to None failed (publickey): 'key_id'
sergii-mamedov commented 2 years ago

I think that I will recreate the VM with one key, since we already lost a lot of time for debugging.

JosepSampe commented 2 years ago

@kpavel Consume mode does not have key_id in config since the VM is manually created trough the dashboard

kpavel commented 2 years ago

@sergii-mamedov Getting that message means it passed the ssh keypair validation @JosepSampe Will fix it asap

kpavel commented 2 years ago

Submitted new fix https://github.com/lithops-cloud/lithops/pull/868

@sergii-mamedov Revalidated with my instance running in consume mode with multiple ssh keys. It works. @JosepSampe Fixed the non mandatory ssh key id during validation. validation actually doesn't need to know key id.

sergii-mamedov commented 2 years ago

@kpavel This time everything works without problems. Thanks.

kpavel commented 2 years ago

@kpavel This time everything works without problems. Thanks.

Excellent. Sorry it took that long.