Open mturilli opened 7 years ago
Added merzky1 with role admin to radical-benchmark project in OpenShift. This should give Andre the right to access the project and use the oc
command to create a port forward. Further, this should enable Andre also to deploy new containers within that project.
This does not work because at the moment we cannot access the container/mongodb server from Titan's headnode and compute nodes. I wrote to Jason (the person in charge of OpenShift at ORNL) asking for help.
Installed MongoDB on a DTN node by download generic linux binaries from https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.6.12.tgz
Unfortunately, port 27017 is filtered out from both Titan's headnode and compute nodes also in the DTNs.
RADICAL_PILOT_DBURL='mongodb://lgn:pswd@mongodb-radical-benchmark.apps.ccs.ornl.gov:80/htcbenchmark'
$ ./00_getting_started.py ornl.titan_aprun
new session: [rp.session.titan-ext7.mturilli1.017428.0000] \
database : [mongodb://lgn:pswd@mongodb-radical-benchmark.apps.ccs.ornl.gov:80/htcbenchmark]
err
Traceback (most recent call last):
File "./00_getting_started.py", line 36, in
RuntimeError: Couldn't create new session (database URL 'mongodb://radical:2r4d1c4l@mongodb-radical-benchmark.apps.ccs.ornl.gov:80/htcbenchmark' incorrect?): ids don't match -1462843969 808465440
Googled the exception and found that, usually, it is thrown when a process reads the answer to a request made by another process. Any idea?
I had a chat with @itomaldonado about how this could be caused by header rewriting and sent Jason an email explaining the issue, asking for guidance.
Alas, I don't think I have seen that specific error before - lets see what Jason answers. Thanks for involving him.
Jason changed our service over to a Type: NodePort
and moved the actual port assignment to a port that is allowed through the firewall from Titan to the OpenShift Dev cluster.
The MongoDB endpoint is mongodb://lgn:pswd@openshift.ccs.ornl.gov:30008/htcbenchmark
Testing.
Test successful:
$ ./00_getting_started.py ornl.titan_aprun
================================================================================
Getting Started (RP version 0.47)
================================================================================
new session: [rp.session.titan-ext6.mturilli1.017428.0003] \
database : [mongodb://lgn:pswd@mongodb-radical-benchmark.apps.ccs.ornl.gov:30008/htcbenchmark]
ok
read config ok
--------------------------------------------------------------------------------
submit pilots
create pilot manager ok
create pilot description [ornl.titan_aprun:64] ok
submit 1 pilot(s)
. ok
--------------------------------------------------------------------------------
submit units
create unit manager ok
add 1 pilot(s) ok
create 5 unit description(s)
..... ok
submit 5 unit(s)
..... ok
--------------------------------------------------------------------------------
gather results
wait for 5 unit(s)
+++++ ok
--------------------------------------------------------------------------------
finalize
closing session rp.session.titan-ext6.mturilli1.017428.0003 \
close unit manager ok
close pilot manager \
wait for 1 pilot(s)
timeout
ok
+ rp.session.titan-ext6.mturilli1.017428.0003 (json)
+ pilot.0000 (profiles)
+ pilot.0000 (logfiles)
session lifetime: 135.5s ok
--------------------------------------------------------------------------------
Based on Andre's report, we seem to be hitting a bottleneck with the mongodb deployed at ORNL. I tried to:
I wrote Jason asking whether we can:
Deployed a test image (ephemeral) via OpenShift. I did the following:
ssh -D8080 <user_name>@dtn.ccs.ornl.gov
127.0.0.1:8080
https://openshift.ccs.ornl.gov:8443/
with my ORNL credentials/passcodeEndpoint as reported by OpenShift dashboard and the oc (CLI for OpenShift) executed from the DTN on which I sshed: