microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

test JAWS nmdc perlmutter site #336

Closed aclum closed 1 week ago

aclum commented 1 year ago

The JAWS team has set up a site on Perlmutter for NMDC. Test that this works on an example workflow.

aclum commented 1 year ago

Slacked Elais as this still appears to be trying to use cori cscratch which doesn't exist anymore.

ssarrafan commented 1 year ago

Alicia is in the midst of JAWS testing. Will move to Zoe sprint.

aclum commented 1 year ago

The jaws team is working on some perlmutter issues that impacted my test submission. Dani will bring this up at the JAWS Monday meeting

jaws log 65101

STATUS_FROM STATUS_TO TIMESTAMP COMMENT

created upload queued 2023-06-28 10:28:02
upload queued upload complete 2023-06-28 10:28:02
upload complete submission failed 2023-06-28 10:28:14 RPC function error: Instance <Run at 0x150a915acbb0> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/20/bhk3)
submission failed email sent 2023-06-28 10:28:24
email sent done 2023-06-28 10:28:35

aclum commented 1 year ago

This is blocked on a new JAWS release which is expected next week, needs to be moved to the next sprint.

ssarrafan commented 1 year ago

JAWS release is still pending, they are reviewing some errors. Will move this to the next sprint. @aclum let me know if you prefer for this to go to the backlog till after ESA.

aclum commented 1 year ago

Yes, this should be in the backlog until after ESA.

aclum commented 12 months ago

Still no ETA on a release date from the JAWS team.

aclum commented 11 months ago

The new version JAWS was released last week. I ran a test this morning and it failed and I was unable to debug b/c of permission issues so filed a ticket for that. https://code.jgi.doe.gov/advanced-analysis/jaws-support/-/issues/111

aclum commented 11 months ago

The perlmutter JAWS site is working with cds_prediction.wdl. Current blockers 1) when the workflow prefix matches the site name 2) issue with runtime_minutes

aclum commented 10 months ago

backlogging this issue until blockers are resolved.

aclum commented 10 months ago

workflow prefix matches the site name is fixed but not released yet.

aclum commented 8 months ago

prefix issue fixed was released by the jaws team. I ran an issue yesterday and there was an issue with /refdata mount. See https://code.jgi.doe.gov/advanced-analysis/jaws-site/-/issues/1756 This is fixed but I don't believe deployed yet.

aclum commented 7 months ago

A filtering test run on Dec 18 2023 completed successfully so the volume mounting appears to be working. Perlmutter is down today.

aclum commented 3 months ago

Closing as I believe all issues have been resolved. Will open a new ticket if new issues come up.