Open dulloa21 opened 1 month ago
Try replacing tf.Simulator.getCUDAConfig()
with tf.Simulator.cuda_config
.
Also, depending on how many bonds are in your simulation, you may not see performance improvement from GPU acceleration. Typically bonded interactions don't benefit much from GPU acceleration when there aren't many bonds.
Hi thank you for your help. I used tf.Simulator.cuda_config
and that works well. However, tf.Simulator.cuda_config.bonds
says there is no attribute. Is there another way to access bonds? I do not see it in the attribute list for cuda_config. I actually plan on having possibly several hundred bonds so this acceleration would be helpful.
What is the output of print(type(tf.Simulator.cuda_config), dir(tf.Simulator.cuda_config))
?
The output is <class 'NoneType'> ['__bool__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
This result strongly suggests that CUDA support wasn't built. What is the value of tf.has_cuda
?
The value is 1. Daisy Ulloa
On Wed, Jul 10, 2024 at 2:01 PM tjsego @.***> wrote:
This result strongly suggests that CUDA support wasn't built. What is the value of tf.has_cuda?
— Reply to this email directly, view it on GitHub https://github.com/tissue-forge/tissue-forge/issues/70#issuecomment-2221457957, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFWJXRFOJPVPQXX7CR3RHIDZLWOKDAVCNFSM6AAAAABKTXVKN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGQ2TOOJVG4 . You are receiving this because you authored the thread.Message ID: @.***>
Thanks. Can you confirm that the example cell_sorting_cuda.py utilizes the GPU? I'm trying to determine whether this issue is particular to the build environment, Python API, or something else entirely.
Hi. I have ran the example code and the simulation itself runs fine, however, there is an error with the benchmark. Running the benchmark gives the following: `** Benchmarking
Sending engine to GPU... -2147467259` This portion is repeated multiple times before the kernel dies. It does not hit any of the other benchmarks besides this one.
Thanks, that's helpful. Seems like there's an issue with the build, but let's confirm with the following. Right after the call to tf.init
, add tf.Logger.enableFileLogging('tf_log.txt', tf.Logger.TRACE)
and then repeat the demo. The demo will create a file tf_log.txt
in the current working directory that contains information about what was executed and (hopefully) what went wrong. Can you complete these steps and upload the contents of the created file?
I have attached the log file below. tf_log (1).txt
Thanks. The following errors were reported:
ERROR: Code: -2147467259, Msg: system has unsupported display driver / cuda driver combination, File: /opt/conda/tissue-forge/source/mdcore/src/tfEngine_cuda.cu, Line: 0, Function: HRESULT TissueForge::cuda::engine_cuda_setdevice(TissueForge::engine*, int), func: HRESULT TissueForge::errSet(HRESULT, const char*, int, const char*, const char*), file:/opt/conda/tissue-forge/source/tfError.cpp,lineno:73
ERROR: Code: -2147467259, Msg: Failed to set device., File: /opt/conda/tissue-forge/source/cuda/tfEngineConfig.cpp, Line: 0, Function: HRESULT TissueForge::cuda::EngineConfig::setDevice(int), func: HRESULT TissueForge::errSet(HRESULT, const char*, int, const char*, const char*), file:/opt/conda/tissue-forge/source/tfError.cpp,lineno:73
It looks like you may need to update your drivers or find some compatible versions. This one is pretty tough for me to help with generally, but one easy fix might be along the following: what GPU are you using, and what did you set CUDAARCHS
to for the build?
There are two possible GPU choices as tissue forge is running on a cluster. NVIDIA L40 (Driver version: 550.90.07) or NVIDIA A100 (Driver version: 550.76) and Tissue Forge was built with CUDAARCHS set to 80.
Ok so Tissue Forge was built to target the A100. If the L40 is the default device (device 0), then you may need to target the A100 manually.
Unfortunately, the online docs are populated from the API without CUDA support, so the CUDA interface doesn't have supporting API docs for the Python interface. But here are the API features to try this fix:
tf.cuda.getNumDevices
will return the number of GPUs available. Likely for you, it will return "2".
tf.cuda.getDeviceName
will return a string of the name that corresponds to a passed device id integer. The default device has id 0
. Likely the second device (hopefully the A100) has device id 1
.
tf.Simulator.cuda_config.bonds
and tf.Simulator.cuda_config.engine
both have the following methods:
getDevice
returns the integer id of the currently targeted device. setDevice
sets the currently targeted device to a passed integer id. My guess is, you can verify the device id of the A100 with tf.cuda.getDeviceName
and pass it to setDevice
of the module you want to run on the A100. You can also adjust your build to target both GPUs by adding the compute capability of the L40 (8.9).
Let me know how that goes.
Hi thank you for the help. I went ahead and ran the getNumDevices function and it interestingly enough returned 0. The getDeviceName function also returned \x06 when checking the default device.
Ok would it be possible to package and share the build log? If you're uncomfortable with uploading here, direct email would work: timothy (dot) sego [at] medicine (dot) ufl (dot) edu. There are a number of ways you could get this result (especially since you're building on HPC) that might be best determined by reviewing the details of the build.
Hello! I am currently working on a version of Tissue Forge installed with GPU-acceleration (tf.has_cuda = 1), however, data usage shows that GPU usage is at 0% while a simulation is running. I have tried to run
cuda_config_bonds = tf.Simulator.getCUDAConfig().bonds
from the documentation, however I receive the following error:'SimulatorInterface' object has no attribute 'getCUDAConfig'
. I was wondering how I could resolve this issue and begin offloading work to the GPU. I am currently running a 100x100 area simulation with about 150 objects in the simulation(excluding bonds).Any help is greatly appreciated. Thank you!!