CUDA code runs but does not march through time - Githubissues

usnistgov / hiperc

High Performance Computing Strategies for Boundary Value Problems

https://pages.nist.gov/hiperc/en/latest/index.html

41 stars 8 forks source link

CUDA code runs but does not march through time #66

Open amjokisaari opened 7 years ago

amjokisaari commented 7 years ago

in gpu/cuda, I run ./diffusion ../params.txt

The code appears to execute. PNGs and CSVs are generated. However, it looks like no time marching is occurring. The attached image is the final one in the sequence, they all look the same. Runlog.csv does have data generated out to 100,000 iterations.

data0100000

tkphd commented 7 years ago

As it runs, can you use nvidia-smi to check whether the GPU is doing any work? ( #45 )

tkphd commented 7 years ago

Also, which graphics card are you using? It's possible the specified gencode flags are mismatched. If this is the cause, fixing it will require a much more sophisticated make/cmake solution.

amjokisaari commented 7 years ago

ahhh this is opening a can of worms. My laptop (Dell M5000 series) has integrated Intel graphics as well as an Nvidia card (Quadro M1000M). Currently I'm running my OS graphics on the integrated graphics card and I have no idea how CUDA code fares when doing this dual-graphics-card-but-running-the-integrated-one thing. Running "nvidia-smi" in the terminal gives me command not found, some brief foruming brings up this as a potential solution...

amjokisaari commented 7 years ago

I may need to swap the entire system over to Nvidia, which means making friends with all the Nvidia graphics drivers again....

tkphd commented 7 years ago

This looks like an error stemming from a mismatch between the -gencode flags and the hardware. For your device (Quadro M, Maxwell architecture?), try -gencode arch=compute_50,code=sm_50, based on this summary. If that doesn't work, remove the -gencode arch=compute_xx,code=sm_xx flag entirely. Then start messing with drivers.

Tesla K80 (Kepler architecture):
- -gencode arch=compute_35,code=sm_35: yields expected diffusion field (see the top-level README) :smiley:
- No -gencode flag at all: yields expected diffusion field :smiley:
Tesla C2075 (Fermi architecture):
- -gencode arch=compute_35,code=sm_35: yields all zeros, despite nvidia-smi showing the GPU under load :frowning:
- -gencode arch=compute_20,code=sm_20: yields expected diffusion field :smiley:
- No -gencode flag at all: yields expected diffusion field :smiley:

amjokisaari commented 7 years ago

nope, neither changing the flags to 50 nor removing them entirely resolved the issue. Boo :(

tkphd commented 7 years ago

Boo indeed. This might be a hardware/driver issue. Can you test on another machine? (Not giving up on this machine, just want to know if you can get it running at all.)

amjokisaari commented 7 years ago

...miiiight be able to try doing this in Windows??

Also, here's another stupid question: with reinstallation, nvdia card has the open-source nouveau drivers installed. These wouldn't have a chance in hell of working, would they..?

tkphd commented 7 years ago

Worth a try, yeah? (re. both Windows and nouveau) There's also apparently a deep incompatibility with CUDA and GCC>4.9, so, yeah. Lots of complications.

tkphd commented 7 years ago

Is it possible your GPU doesn't support double-precision floats?

amjokisaari commented 7 years ago

... How do I check that?

On Sep 6, 2017 9:52 PM, "Trevor Keller" notifications@github.com wrote:

Is it possible your GPU doesn't support double-precision floats?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usnistgov/phasefield-accelerator-benchmarks/issues/66#issuecomment-327668705, or mute the thread https://github.com/notifications/unsubscribe-auth/AGXflGet5leVTPiU2nv5tMSb_IPv8lMdks5sf1prgaJpZM4O7-4F .

tkphd commented 7 years ago

Google your hardware, or
In common_diffusion/type.h, change typedef double fp_t; to typedef float fp_t; then recompile and run again.

tkphd commented 7 years ago

I had the same bug crop up on older hardware. Building the CUDA example without the specific flags worked for me -- committed in 73294de. Does the bug still affect your machine, @amjokisaari?