marvel-nccr / quantum-mobile

A Virtual Machine for computational materials science
https://quantum-mobile.readthedocs.io
Other
91 stars 32 forks source link

Ansible role hangs on TASK [marvel-nccr.aiidalab : install server-side (aiida) dependencies] #193

Open hpcchris opened 3 years ago

hpcchris commented 3 years ago

Sorry if this is the wrong place to put this. Trying to run the Ansible playbooks on a Ubuntu 18.04 host. This role hangs, and I see files contantly being added to /tmp but the task never moves forward.

...
...
drwx------  2 max users 4096 Sep 23 17:33 pip-unpack-ltqxzc_p
drwx------  2 max users 4096 Sep 23 18:21 pip-unpack-k_vi86kr
drwx------  2 max users 4096 Sep 23 19:10 pip-unpack-83n9xxwe
drwx------  2 max users 4096 Sep 23 19:58 pip-unpack-63wa2c93
drwx------  2 max users 4096 Sep 23 19:58 pip-req-tracker-245h032u

When I run strace on the long-running pip process I see:

stat("/tmp/pip-modern-metadata-f4n00__h/pgsu.dist-info/METADATA", {st_mode=S_IFREG|0644, st_size=4270, ...}) = 0
openat(AT_FDCWD, "/tmp/pip-modern-metadata-f4n00__h/pgsu.dist-info/METADATA", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=4270, ...}) = 0
ioctl(7, TCGETS, 0x7ffe757cdc20)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(7, 0, SEEK_CUR)                   = 0
lseek(7, 0, SEEK_CUR)                   = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=4270, ...}) = 0
read(7, "Metadata-Version: 2.1\nName: pgsu"..., 4271) = 4270
read(7, "", 1)                          = 0
close(7)
chrisjsewell commented 3 years ago

Heya, yeh no worries thanks; do you know at what step it hangs, i.e. do you have the ansible output?

ltalirz commented 3 years ago

Hi @hpcchris - thanks for the note; I suspect this may have to do with the update of pip, whose latest versions have a new "backtracking" mechanism when it discovers dependency conflicts.

We recently fixed a dependency issue in the aiidalab-widgets-base app https://github.com/aiidalab/aiidalab-widgets-base/issues/228

Would you (or the other chris ;-) ) mind opening a PR against the https://github.com/marvel-nccr/ansible-role-aiidalab repo where you update the versions of the apps and the aiidalab version to the latest one?

This should also build the role on CI

hpcchris commented 3 years ago

Hi - thanks, all I see is this last line in ansible.log

2021-09-23 07:53:31,874 p=8134 u=christay n=ansible | ok: [carina -> localhost]
2021-09-23 07:53:31,907 p=8134 u=christay n=ansible | TASK [marvel-nccr.aiidalab : install server-side (aiida) dependencies] *****************************
2021-09-23 07:53:31,909 p=8134 u=christay n=ansible | Thursday 23 September 2021  07:53:31 +0000 (0:00:00.519)       2:34:53.071 ****

Can you let me know a little more about how I can clone the ansible-role repo you mentioned? I cloned tis repo to do the build on my VM:

git clone https://github.com/marvel-nccr/quantum-mobile.git

ltalirz commented 3 years ago

@hpcchris The easiest is to replace the folder of the role (that is installed from ansible galaxy) with the cloned git repository like so:

cd roles
rm -rf marvel-nccr.aiidalab
git clone https://github.com/marvel-nccr/ansible-role-aiidalab marvel-nccr.aiidalab

After this you can edit the role, commit your changes, and then make a pull request. Thanks for your help!

hpcchris commented 3 years ago

Sorry about all the dumb problems. I deleted and cloned new versions of roles/marvel-nccr.aiidalab and roles/marvel-nccr.aiida, but I get this when I run tox again. I tried rm -rf .tox. What's the best way to get around this?


- downloading role from https://github.com/marvel-nccr/ansible-role-aiida/archive/v4.2.0.tar.gz
- extracting marvel-nccr.aiida to /home1/christay/quantum-mobile/roles/marvel-nccr.aiida
[WARNING]: - marvel-nccr.aiida was NOT installed successfully: the specified role marvel-nccr.aiida appears to already
exist. Use --force to replace it.
ERROR! - you can use --ignore-errors to skip failed roles and finish processing the list.
ERROR: InvocationError for command /home1/christay/quantum-mobile/.tox/ansible/bin/ansible-galaxy install -r requirements.yml (exited with code 1)
_______________________________________________________ summary ________________________________________________________
ERROR:   ansible: commands failed```
ltalirz commented 3 years ago

hey @hpcchris , sorry, my instructions were a bit sparse. You were rerunning the whole process for creating the image (which includes downloading the roles from ansible galaxy). This will complain since it finds the git repository in place of the extracted archive from ansible galaxy.

The QM docs explain how to run a specific step of the playbook, e.g.

tox -e ansible -- --tags aiidalab

However, after having a look at the role, it seems we can simplify it since the app's python dependencies are now specified in the app itself. I've opened https://github.com/marvel-nccr/ansible-role-aiidalab/pull/25 . I haven't tried it locally so far; let's see whether CI passes - @chrisjsewell comments on the PR welcome!

chrisjsewell commented 3 years ago

What's the best way to get around this?

Off-hand, I guess the easiest solution is just to delete the generated roles/marvel-nccr.aiida folder

I haven't tried it locally so far; let's see whether CI passes - @chrisjsewell comments on the PR welcome!

will try to have a quick look soon

hpcchris commented 3 years ago

Hi- just wanted to check in, when’s a good time for me to try another git clone and install? Thanks, Chris

ltalirz commented 3 years ago

hey @hpcchris - sorry for the delay, this week I'm very busy, and my Pr above for the aiidalab role pointed to a dependency issue in the aiida role https://github.com/marvel-nccr/ansible-role-aiida/pull/73

I think this may need manual debugging, but currently I'm not able to build the quantum mobile because of some permission issues with vbox on my macos. I've pinged chris in https://github.com/marvel-nccr/ansible-role-aiida/pull/73#issuecomment-929694961

hpcchris commented 3 years ago

Thank you. Let me know if there’s anything I can do to help. Chris