tilde-lab / yascheduler

Yet another cloud computing scheduler for the high-throughput cloud scientific simulations
https://mpds.io/search/ab%20initio%20calculations
MIT License
5 stars 4 forks source link

Keys are not automatically loaded and mismanaged between providers #101

Open blokhin opened 2 years ago

blokhin commented 2 years ago

A normal connection succeeds:

root@aiida9:~# ssh X.X.X.X
Linux labs 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64

...
root@labs:~# logout
Connection to X.X.X.X closed.

but scheduler connection fails:

root@aiida9:~# yasetnode X.X.X.X~4
Traceback (most recent call last):
  File "/usr/local/bin/yasetnode", line 8, in <module>
    sys.exit(manage_node())
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 430, in manage_node
    asyncio.run(_manage_node())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 411, in _manage_node
    machine = await RemoteMachine.create(
  File "/usr/local/lib/python3.9/dist-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/remote_machine/remote_machine.py", line 192, in create
    conn = await asyncssh.connection.connect(
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 7834, in connect
    return await asyncio.wait_for(
  File "/usr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 447, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied
blokhin commented 2 years ago

Another case, very frustrating:

blokhin commented 2 years ago

@knopki please could you have a look?

blokhin commented 7 months ago

Solution (FIXME?): delete keys from data_dir and retry.

blokhin commented 5 months ago

also #127