radical-collaboration / MSKCC

0 stars 0 forks source link

Compiling OpenMM on Titan with CUDA 7.5 #3

Closed jchodera closed 7 years ago

jchodera commented 7 years ago

@jdakka : I'd give OpenMM 6.3.1 a try first, since I believe that is built against CUDA 7.5. After you've installed miniconda Python, you can

conda config --add channels omnia --add channels conda-forge
conda install --yes openmm==6.3.1

If you run our benchmark script, you should see it mention that it is using the CUDA platform. If it says OpenCL or CPU, it's not able to use your CUDA libraries for some reason. You can check which platforms are available with

>>> from simtk import openmm
>>> print([openmm.Platform.getPlatform(index).getName() for index in range(openmm.Platform.getNumPlatforms())])
['Reference', 'CPU', 'OpenCL']

(my mac doesn't have CUDA available)

If you need to build from OpenMM from source for CUDA 7.5, you should be able to follow the instructions here on compiling OpenMM from source using CUDA 7.5 installed on Titan. Be sure to pay attention to the dependencies.

It's best to install miniconda Python first anyway, since we can use that to easily install other dependencies for our scripts if needed. (We've tried to minimize dependencies in this initial benchmark script, but future elaborations will require more conda-installable dependencies.)

I've tried to save you some pain by building a conda-installable OpenMM built against CUDA 7.5, but no luck so far.

jdakka commented 7 years ago

I installed OpenMM on Titan with python 2.7 , it was able to recognize CUDA:

from simtk import openmm print([openmm.Platform.getPlatform(index).getName() for index in range(openmm.Platform.getNumPlatforms())]) ['Reference', 'CUDA', 'OpenCL']

however when I run the benchmark.py file I get the following error:

Deserializing simulation... Traceback (most recent call last): File "benchmark.py", line 38, in [context, integrator, system, state] = deserialize_simulation('serialized/abl-imatinib') File "benchmark.py", line 18, in deserialize_simulation system = xmls.deserialize(sysxml) File "/ccs/home/jdakka/miniconda3/envs/snakes/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 16952, in deserialize return XmlSerializer.deserializeSystem(inputString) File "/ccs/home/jdakka/miniconda3/envs/snakes/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 16821, in deserializeSystem return _openmm.XmlSerializer_deserializeSystem(inputString) Exception: Unsupported version number

jchodera commented 7 years ago

It sounds like you installed OpenMM 6.3.1 via conda successfully, then!

Let me see if I can update the serialized version for OpenMM 6.3.1. I forgot that this wasn't backwards-compatible. Will do this now.

Also, I may be able to get the latest OpenMM conda package built for cuda 7.5, but more on that soon.

jchodera commented 7 years ago

@jdakka : I've updated the serialized XML files in the PR (https://github.com/radical-collaboration/MSKCC/pull/2) to work with OpenMM 6.3.1. Give those a try.

jdakka commented 7 years ago

Looks like it is running the benchmark OK interactively. I submitted PBS script using 8 nodes for 46648 atoms.

jchodera commented 7 years ago

Looks like it is running the benchmark OK interactively.

Great!

I submitted PBS script using 8 nodes for 46648 atoms.

While OpenMM does support splitting a single system across multiple GPUs, it does so very inefficiently. Our use case is much closer to running N independent (or weakly-coupled) simulations on N GPUs, so if your test simply ran the same benchmark on each GPU, that should be close to what we want to estimate overall throughput. You should in principle only need to request one thread-slot per GPU, though we can use the other thread-slots available on each node for online analysis and updating of the dynamic workload balance in the future.

Tagging in @pgrinaway here too.

jchodera commented 7 years ago

I'm also still working on building a more recent OpenMM conda package for CUDA 7.5, but no luck getting this to work yet.

jdakka commented 7 years ago

@jchodera: quick remark: in the READme you specified that the simulation finished rather fast. completed 5000 steps in 26.521 s : performance is 32.578 ns/day

I’ve been running this benchmark for 45 minutes yet it’s still stuck on the benchmarking step. I noticed that it uses the “reference” platform instead of CUDA like you had in the example. Is there a way for have the integrator point to the “right” platform. I assume it would have picked up on CUDA since it is available like I showed previously:

print([openmm.Platform.getPlatform(index).getName() for index in range(openmm.Platform.getNumPlatforms())]) ['Reference', 'CUDA', 'OpenCL']

On May 27, 2017, at 12:58 AM, John Chodera notifications@github.com wrote:

I'm also still working on building a more recent OpenMM conda package for CUDA 7.5, but no luck getting this to work yet.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304427749, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAlOir5Bfd8CzL4wiOsdDRkh9tFfxks5r961dgaJpZM4NoI97.

jchodera commented 7 years ago

The Reference platform is a slow single-threaded double-precision implementation, and would take hours (or days!) to complete the benchmark. We need to get either CUDA (fastest) or OpenCL to run here.

It normally tries CUDA first, then OpenCL, then CPU, then Reference, so I'm not quite sure what the problem is if you find all are available. Can you get the benchmark to run interactively on a node with a GPU where it lists the CUDA platform as available? It should only take a minute or two to run if it's using CUDA.

I think it must be that some kernels are not available for the CUDA platform for some reason, though I'm not sure why. I'll add a few more debug lines to see if we can identify what is going on.

I suspect this may indicate we need to install OpenMM 7.1.1 from source, however.

jdakka commented 7 years ago

The info I was referring to is from the interactive session. I added in the platforms as the first print statement. I’m curious to see how OpenMM 7.1.1 works. Let me check both versions on another cluster.

jdakka@titan-batch6:~/mskcc/MSKCC/abl-imatinib-benchmark> python benchmark.py ['Reference', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "Reference”.

On May 27, 2017, at 10:14 AM, John Chodera notifications@github.com wrote:

The Reference platform is a slow single-threaded double-precision implementation, and would take hours (or days!) to complete the benchmark. We need to get either CUDA (fastest) or OpenCL to run here.

It normally tries CUDA first, then OpenCL, then CPU, then Reference, so I'm not quite sure what the problem is if you find all are available. Can you get the benchmark to run interactively on a node with a GPU where it lists the CUDA platform as available? It should only take a minute or two to run if it's using CUDA.

I think it must be that some kernels are not available for the CUDA platform for some reason, though I'm not sure why. I'll add a few more debug lines to see if we can identify what is going on.

I suspect this may indicate we need to install OpenMM 7.1.1 from source, however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304454491, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAhyeRwNMeVlLhR8ccL8O9dlFsxRfks5r-C_LgaJpZM4NoI97.

jdakka commented 7 years ago

So I tested the script against Xstream. They have openMM 6.3.1 as a module compiled against CUDA 7.0. The benchmark was able to correctly latch onto CUDA.

[xs-jdakka@xs-0005 ~/mskcc/MSKCC/abl-imatinib-benchmark]$ python benchmark.py ['Reference', 'CPU', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "CUDA". Initial potential energy is -141208.484 kcal/mol Warming up integrator to trigger kernel compilation... Benchmarking...

On May 27, 2017, at 1:24 PM, Jumana Dakka jumanadakka@gmail.com wrote:

The info I was referring to is from the interactive session. I added in the platforms as the first print statement. I’m curious to see how OpenMM 7.1.1 works. Let me check both versions on another cluster.

jdakka@titan-batch6:~/mskcc/MSKCC/abl-imatinib-benchmark> python benchmark.py ['Reference', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "Reference”.

On May 27, 2017, at 10:14 AM, John Chodera <notifications@github.com mailto:notifications@github.com> wrote:

The Reference platform is a slow single-threaded double-precision implementation, and would take hours (or days!) to complete the benchmark. We need to get either CUDA (fastest) or OpenCL to run here.

It normally tries CUDA first, then OpenCL, then CPU, then Reference, so I'm not quite sure what the problem is if you find all are available. Can you get the benchmark to run interactively on a node with a GPU where it lists the CUDA platform as available? It should only take a minute or two to run if it's using CUDA.

I think it must be that some kernels are not available for the CUDA platform for some reason, though I'm not sure why. I'll add a few more debug lines to see if we can identify what is going on.

I suspect this may indicate we need to install OpenMM 7.1.1 from source, however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304454491, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAhyeRwNMeVlLhR8ccL8O9dlFsxRfks5r-C_LgaJpZM4NoI97.

jdakka commented 7 years ago

So I tested the script against Xstream. They have openMM 6.3.1 as a module compiled against CUDA 7.0. The benchmark was able to correctly latch onto CUDA.

[xs-jdakka@xs-0005 ~/mskcc/MSKCC/abl-imatinib-benchmark]$ python benchmark.py ['Reference', 'CPU', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "CUDA". Initial potential energy is -141208.484 kcal/mol Warming up integrator to trigger kernel compilation... Benchmarking...

On May 27, 2017, at 1:24 PM, Jumana Dakka <jumanadakka@gmail.com mailto:jumanadakka@gmail.com> wrote:

The info I was referring to is from the interactive session. I added in the platforms as the first print statement. I’m curious to see how OpenMM 7.1.1 works. Let me check both versions on another cluster.

jdakka@titan-batch6:~/mskcc/MSKCC/abl-imatinib-benchmark> python benchmark.py ['Reference', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "Reference”.

On May 27, 2017, at 10:14 AM, John Chodera <notifications@github.com mailto:notifications@github.com> wrote:

The Reference platform is a slow single-threaded double-precision implementation, and would take hours (or days!) to complete the benchmark. We need to get either CUDA (fastest) or OpenCL to run here.

It normally tries CUDA first, then OpenCL, then CPU, then Reference, so I'm not quite sure what the problem is if you find all are available. Can you get the benchmark to run interactively on a node with a GPU where it lists the CUDA platform as available? It should only take a minute or two to run if it's using CUDA.

I think it must be that some kernels are not available for the CUDA platform for some reason, though I'm not sure why. I'll add a few more debug lines to see if we can identify what is going on.

I suspect this may indicate we need to install OpenMM 7.1.1 from source, however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304454491, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAhyeRwNMeVlLhR8ccL8O9dlFsxRfks5r-C_LgaJpZM4NoI97.

jdakka commented 7 years ago

I tested OpenMM 7.1.1 through conda installation on Xstream and it only references CUDA if CUDA 8.0 is loaded.

On May 27, 2017, at 2:01 PM, Jumana Dakka jumanadakka@gmail.com wrote:

So I tested the script against Xstream. They have openMM 6.3.1 as a module compiled against CUDA 7.0. The benchmark was able to correctly latch onto CUDA.

[xs-jdakka@xs-0005 ~/mskcc/MSKCC/abl-imatinib-benchmark]$ python benchmark.py ['Reference', 'CPU', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "CUDA". Initial potential energy is -141208.484 kcal/mol Warming up integrator to trigger kernel compilation... Benchmarking...

On May 27, 2017, at 1:24 PM, Jumana Dakka <jumanadakka@gmail.com mailto:jumanadakka@gmail.com> wrote:

The info I was referring to is from the interactive session. I added in the platforms as the first print statement. I’m curious to see how OpenMM 7.1.1 works. Let me check both versions on another cluster.

jdakka@titan-batch6:~/mskcc/MSKCC/abl-imatinib-benchmark> python benchmark.py ['Reference', 'CUDA', 'OpenCL'] Deserializing simulation... System contains 46648 atoms. Using platform "Reference”.

On May 27, 2017, at 10:14 AM, John Chodera <notifications@github.com mailto:notifications@github.com> wrote:

The Reference platform is a slow single-threaded double-precision implementation, and would take hours (or days!) to complete the benchmark. We need to get either CUDA (fastest) or OpenCL to run here.

It normally tries CUDA first, then OpenCL, then CPU, then Reference, so I'm not quite sure what the problem is if you find all are available. Can you get the benchmark to run interactively on a node with a GPU where it lists the CUDA platform as available? It should only take a minute or two to run if it's using CUDA.

I think it must be that some kernels are not available for the CUDA platform for some reason, though I'm not sure why. I'll add a few more debug lines to see if we can identify what is going on.

I suspect this may indicate we need to install OpenMM 7.1.1 from source, however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304454491, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAhyeRwNMeVlLhR8ccL8O9dlFsxRfks5r-C_LgaJpZM4NoI97.

jchodera commented 7 years ago

I tested OpenMM 7.1.1 through conda installation on Xstream and it only references CUDA if CUDA 8.0 is loaded.

That matches what we expect. The CUDA platform unfortunately needs to be linked against a specific version of CUDA---in this case, the 7.1.x release is for CUDA 8.0, which is the current stable release version of CUDA, released 5 Apr 2016---over a year ago.

I've been fighting with our docker build system to try to compile a CUDA 7.5 conda build for you to try on Titan, but I haven't had any luck so far.

You should at least be able to get the OpenMM 7.1.1 conda package to run the OpenCL platform on Titan provided OpenCL libraries are installed---this doesn't require OpenMM be linked against a particular CUDA version. It's ~25% slower, but not unusably slow. I'm not sure why it fails to run your system, however.

You can try to force a particular platform by changing

context = openmm.Context(system, integrator)

to

# Try to force the OpenCL platform
platform = openmm.Platform.getPlatformByName('OpenCL')
context = openmm.Context(system, integrator, platform)
jchodera commented 7 years ago

@jdakka : I think I've managed to solve the issues with building a conda version of the latest (git head) OpenMM against CUDA 7.5. Hopefully will get it posted in the next few hours.

jchodera commented 7 years ago

@jdakka : Success! (I hope!)

Give this a try and see if you find the CUDA platform is usable on systems with CUDA 7.5:

conda install --yes -c omnia/label/cuda75 openmm==7.2.0

If so, I can also update the benchmark input files.

If you need to forcibly remove the conda-installed openmm, you can use

# Remove openmm
conda remove --yes openmm
# Clean the package cache
# You might have to omit the `s` from `-tipsy` if you don't have `conda-build` installed
conda clean -tipsy openmm
jdakka commented 7 years ago

I tried again but no luck, I specified the CUDA platform in the code and it spits this error (also included the modules list and conda list in case you see anything that is missing)

jdakka@titan-batch5:~/mskcc/MSKCC/abl-imatinib-benchmark> python benchmark.py ['Reference', 'CUDA', 'OpenCL'] Deserializing simulation... Traceback (most recent call last): File "benchmark.py", line 43, in [context, integrator, system, state] = deserialize_simulation('serialized/abl-imatinib') File "benchmark.py", line 32, in deserialize_simulation context = openmm.Context(system, integrator, platform) File "/ccs/home/jdakka/miniconda3/envs/snakes/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 3874, in init this = _openmm.new_Context(*args) Exception: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1496031609420/work/platforms/cuda/src/CudaContext.cpp:149 (snakes) jdakka@titan-batch5:~/mskcc/MSKCC/abl-imatinib-benchmark> module list Currently Loaded Modulefiles: 1) modules/3.2.10.5 2) nodestat/2.2-1.0502.60539.1.31.gem 3) sdb/1.1-1.0502.63652.4.27.gem 4) alps/5.2.4-2.0502.9950.37.1.gem 5) lustre-cray_gem_s/2.8.0_3.0.101_0.46.1_1.0502.8871.21.1-1.0502.0.6.1 6) udreg/2.3.2-1.0502.10518.2.17.gem 7) ugni/6.0-1.0502.10863.8.28.gem 8) gni-headers/4.0-1.0502.10859.7.8.gem 9) dmapp/7.0.1-1.0502.11080.8.74.gem 10) xpmem/0.1-2.0502.64982.5.3.gem 11) hss-llm/7.2.0 12) Base-opts/1.0.2-1.0502.60680.2.4.gem 13) pgi/16.10.0 14) craype-network-gemini 15) craype-interlagos 16) craype/2.5.9 17) cray-mpich/7.5.2 18) cray-libsci/16.11.1 19) pmi/5.0.11 20) atp/2.0.5 21) PrgEnv-pgi/5.2.82 22) lustredu/1.4 23) xalt/0.7.5 24) module_msg/0.1 25) modulator/1.2.0 26) hsi/5.0.2.p1 27) DefApps 28) cudatoolkit/7.5.18-1.0502.10743.2.1 29) python/2.7.9 (snakes) jdakka@titan-batch5:~/mskcc/MSKCC/abl-imatinib-benchmark> conda list

packages in environment at /ccs/home/jdakka/miniconda3/envs/snakes:

# blas 1.1 openblas conda-forge ca-certificates 2017.4.17 0 conda-forge certifi 2017.4.17 py27_0 conda-forge fftw3f 3.3.4 2 omnia libgfortran 3.0.0 1
ncurses 5.9 10 conda-forge numpy 1.12.1 py27_blas_openblas_200 [blas_openblas] conda-forge openblas 0.2.19 2 conda-forge openmm 7.2.0 py27_0 omnia/label/cuda75 openssl 1.0.2k 0 conda-forge pip 9.0.1 py27_0 conda-forge python 2.7.13 1 conda-forge readline 6.2 0 conda-forge setuptools 33.1.1 py27_0 conda-forge sqlite 3.13.0 1 conda-forge tk 8.5.19 1 conda-forge wheel 0.29.0 py27_0 conda-forge zlib 1.2.11 0 conda-forge

jchodera commented 7 years ago

Well, the good news is that it seems we've compiled against the correct CUDA 7.5 libraries since the CUDA platform is available, but the bad news is that it is not launching in a way that provides access to a GPU.

In an interactive session, are you able to run nvidia-smi and see that the GPU resource you requested appears? You might also check to see if CUDA_VISIBLE_DEVICES is set to something.

We'll need @pgrinaway's help to debug from here, I imagine. @pgrinaway: Have you managed to get Titan access too?

shantenujha commented 7 years ago

I don't think @pgrinaway has Titan access.

@pgrinaway: Here is a starting point: https://www.olcf.ornl.gov/kb_articles/user-account-requests/

Please use CSC230 as project ID. Let me or @jdakka know if there is a problem.

jdakka commented 7 years ago

The interactive nodes I requested are compute nodes which have GPUs yet it doesn’t appear to recognize the GPU. Trying to run other CUDA scripts to see where the issue is.

On May 29, 2017, at 1:32 PM, John Chodera notifications@github.com wrote:

Well, the good news is that it seems we've compiled against the correct CUDA 7.5 libraries since the CUDA platform is available, but the bad news is that it is not launching in a way that provides access to a GPU.

In an interactive session, are you able to run nvidia-smi and see that the GPU resource you requested appears? You might also check to see if CUDA_VISIBLE_DEVICES is set to something.

We'll need @pgrinaway https://github.com/pgrinaway's help to debug from here, I imagine. @pgrinaway https://github.com/pgrinaway: Have you managed to get Titan access too?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/MSKCC/issues/3#issuecomment-304705167, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ_iAmIjisbxqPCHYy_oX-LK9J2ECZshks5r-wEagaJpZM4NoI97.

pgrinaway commented 7 years ago

I don't think @pgrinaway has Titan access.

That's correct. I was originally relying on access through SiTx's proposal, which is still in review (they told me it would be about a month before I received a token). There's not much I can do about that one yet.

Please use CSC230 as project ID. Let me or @jdakka know if there is a problem.

Ok. Is it permissible to have multiple accounts for the same person? I will be doing calculations under another account as well.

shantenujha commented 7 years ago

Short answer is yes you can have multiple accounts for the same person.

I'm not a 100% sure how Titan/OLCF binds accounts and allocations/projects. On XSEDE you have one account (bound to you) and that account can be used with N allocations/projects. At OLCF, till last year, you would get an account for each project/allocation you were on, but either way, you were allowed to join > 1 project.

Hth

jdakka commented 7 years ago

@jchodera I figured out the issue but OpenMM 7.2/cuda75 won't install with the command you provided. I was able to install OpenMM 6.3 and OpenMM 7.1 but during execution it latched to the CPU instead, even though it recognized CUDA as well.

PackageNotFoundError: Package not found: Conda could not find '

shantenujha commented 7 years ago

@jdakka @jchodera Maybe it is better to take this problem to the OpenMM mailing list, rather than have @jchodera troubleshoot?

jchodera commented 7 years ago

No need, @pgrinaway and I are the right people to support this, though I was stuck in a meeting all day. Give me a moment to find the issue.

jchodera commented 7 years ago

Here's the syntax to use:

conda remove --yes openmm
conda clean -plti ---yes
conda install ---yes -c omnia/label/cuda75 openmm

If this doesn't install OpenMM 7.2 from the cuda75 label, let me know what it prints as output.

Thanks so much for being our hands and eyes in working through the OpenMM benchmarking issues!

jdakka commented 7 years ago

@jchodera it installs the package but I'm not seeing the correct version that you have nor and cuda75 label... `conda install -c omnia/label/cuda75 openmm Fetching package metadata ............... Solving package specifications: .

Package plan for installation in environment /ccs/proj/csc230/mskcc/miniconda/envs/venv:

The following NEW packages will be INSTALLED:

openmm: 7.1.1-py27_0 omnia`
jchodera commented 7 years ago

I think I know what happened: The package must have been routed to the wrong label and then overwritten by our nightly dev builds. Let me fix that. Will take about an hour. Apologies again!

jchodera commented 7 years ago

OK, try this:

conda remove --yes openmm
conda clean -plti --yes
conda install -c omnia/label/dev --yes openmm-cuda75
jdakka commented 7 years ago

Success! Quite a few environmental setup hurdles but it's working (I'll write a formal set of instructions in the READ.me for Titan specifically)

(venv) jdakka@titan-batch8:/lustre/atlas/proj-shared/csc230/mskcc/MSKCC/abl-imatinib-benchmark> aprun -n1 python benchmark.py
['Reference', 'CPU', 'CUDA', 'OpenCL']
Deserializing simulation...
System contains 46648 atoms.
Using platform "CUDA".
Initial potential energy is -141208.481 kcal/mol
Warming up integrator to trigger kernel compilation...
Benchmarking...
completed     5000 steps in   22.567 s : performance is   38.286 ns/day
Final potential energy is -141316.976 kcal/mol
Application 14507233 resources: utime ~49s, stime ~6s, Rss ~350156, inblocks ~315692, outblocks ~69481
jchodera commented 7 years ago

Huzzah!

If we want to benchmark the systems mentioned in the NAMD issue here too, I can add those as well.

shantenujha commented 7 years ago

@jchodera -- you're my hero! truly impressed.