luxonis / depthai-ml-training

Some Example Neural Models that we've trained along with the training scripts
MIT License
118 stars 32 forks source link

Google Colab MobilenetSSD training does not use GPU #31

Open dhartness opened 1 year ago

dhartness commented 1 year ago

I attempted to use this notebook, https://github.com/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_Object_Detection_With_Custom_Data_Demo_Training.ipynb, to train a model on and received the following error.

I attempted to clear all outputs, disconnect and clear out my session and tried again but with the same results when I get down to starting the training. Is there a library that should be loaded that is being skipped or a work around.

2022-11-23 20:13:47.476326: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.476531: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.476670: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.476810: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.476941: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.477072: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.477231: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2022-11-23 20:13:47.477251: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...

tersekmatija commented 1 year ago

CC @HonzaCuhel can you look into that please. My guess is there was some hidden upgrade again and some libraries got messed up :)

@dhartness If you don't have a particular reason to stick with the above tutorial, I would suggest you to check out our YoloV6n training tutorial in the meantime. It is newer than MobileNet SSD Lite and should provide you the best accuracy/speed trade-off on our cameras. Export process should also be easier.

dhartness commented 1 year ago

@tersekmatija Thank you for the pointer on your YoloV6 training notebook! I'll give it a try!