Closed stillill closed 11 months ago
Hi Stillill,
Thank you for your inquiry. I can see in the log that the error is due to a compilation problem. (pyopencl.cffi_cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE -). This is unfortunately not something I can influence. In particular NVidia has unstable OpenCL drivers and compilers.
You could try installing the latest nvidia drivers for your system in the hope that the bug has been resolved in the latest version. On the MDT side unfortunately not much can be done.
Best,
Robbert
Hi Robbert,
Thanks for getting back to me about this! No worries if there isn't anything you think you can do here. I've been testing things out more and wanted to add that MDT works fine for me in a Docker container which I created using your Docker.nvidia recipe file. It just doesn't to work on a GPU in a Singularity (now Apptainer) container. Even when I create the Apptainer image by pulling from my working MDT Docker image. The CPU version of MDT works fine for me in Apptainer though. I also can't run the hello world demo.py code, provided by PyOpenCL, using a GPU in Apptainer but again that works fine in Docker. I can run demo.py on the CPU using Apptainer. I ended up posting a message to the Apptainer mailing list to see if anyone had ideas as to why MDT would work fine in Docker but not Apptainer and someone is helping me look into this. One question that came up on the Apptainer mailing list is if the MDT app attempts to write to the container. This would be a problem since Apptainer containers are not writable.
Thanks again!
Hi Stillill,
About your question here: "One question that came up on the Apptainer mailing list is if the MDT app attempts to write to the container. This would be a problem since Apptainer containers are not writable."
To function correctly, MDT requires a few files in your home directory (config files and some model files). At start-up it will try to write these files to your home directory if missing. Perhaps this is what causing you the problems?
Best,
Robbert
Hello,
Thanks for the info! Someone on the Apptainer mailing list finally figured out the problem with the container. It turns out the libnvidia-nvvm.so.4 library was not in the container environment. They suggested adding this library to Apptainer' s nvliblist.conf file and that resolved the error.
Hello,
I am running MDT out of a Singularity container and am getting a runtime error when running the
mdt-model-fit
command. Is there anyway to figure out what the problem is?Thanks!