Open amjokisaari opened 7 years ago
As it runs, can you use nvidia-smi
to check whether the GPU is doing any work? ( #45 )
Also, which graphics card are you using? It's possible the specified gencode
flags are mismatched. If this is the cause, fixing it will require a much more sophisticated make
/cmake
solution.
ahhh this is opening a can of worms. My laptop (Dell M5000 series) has integrated Intel graphics as well as an Nvidia card (Quadro M1000M). Currently I'm running my OS graphics on the integrated graphics card and I have no idea how CUDA code fares when doing this dual-graphics-card-but-running-the-integrated-one thing. Running "nvidia-smi" in the terminal gives me command not found, some brief foruming brings up this as a potential solution...
I may need to swap the entire system over to Nvidia, which means making friends with all the Nvidia graphics drivers again....
This looks like an error stemming from a mismatch between the -gencode
flags and the hardware.
For your device (Quadro M, Maxwell architecture?), try -gencode arch=compute_50,code=sm_50
, based on this summary. If that doesn't work, remove the -gencode arch=compute_xx,code=sm_xx
flag entirely. Then start messing with drivers.
-gencode arch=compute_35,code=sm_35
: yields expected diffusion field (see the top-level README) :smiley:-gencode
flag at all: yields expected diffusion field :smiley:-gencode arch=compute_35,code=sm_35
: yields all zeros, despite nvidia-smi
showing the GPU under load :frowning: -gencode arch=compute_20,code=sm_20
: yields expected diffusion field :smiley:-gencode
flag at all: yields expected diffusion field :smiley:nope, neither changing the flags to 50 nor removing them entirely resolved the issue. Boo :(
Boo indeed. This might be a hardware/driver issue. Can you test on another machine? (Not giving up on this machine, just want to know if you can get it running at all.)
...miiiight be able to try doing this in Windows??
Also, here's another stupid question: with reinstallation, nvdia card has the open-source nouveau drivers installed. These wouldn't have a chance in hell of working, would they..?
Worth a try, yeah? (re. both Windows and nouveau) There's also apparently a deep incompatibility with CUDA and GCC>4.9, so, yeah. Lots of complications.
Is it possible your GPU doesn't support double-precision floats?
... How do I check that?
On Sep 6, 2017 9:52 PM, "Trevor Keller" notifications@github.com wrote:
Is it possible your GPU doesn't support double-precision floats?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usnistgov/phasefield-accelerator-benchmarks/issues/66#issuecomment-327668705, or mute the thread https://github.com/notifications/unsubscribe-auth/AGXflGet5leVTPiU2nv5tMSb_IPv8lMdks5sf1prgaJpZM4O7-4F .
common_diffusion/type.h
, change typedef double fp_t;
to typedef float fp_t;
then recompile and run again.I had the same bug crop up on older hardware. Building the CUDA example without the specific flags worked for me -- committed in 73294de. Does the bug still affect your machine, @amjokisaari?
in gpu/cuda, I run ./diffusion ../params.txt
The code appears to execute. PNGs and CSVs are generated. However, it looks like no time marching is occurring. The attached image is the final one in the sequence, they all look the same. Runlog.csv does have data generated out to 100,000 iterations.