ValueError: No gradients provided for any variable

MahbubaAnnesha commented 9 months ago

Hi,

I've been trying to run the code for a while now, but can't seem to get around the following ValueError (error message attached below).

I've tried using the following tensorflow containers: nvcr.io/nvidia/tensorflow:22.09-tf2-py3, nvcr.io/nvidia/tensorflow:22.10-tf2-py3, nvcr.io/nvidia/tensorflow:22.11-tf2-py3. Also pip installed opencv-python==4.5.1.48 (closest one available to the version specified). But none of them seem to help with this error.

I've attached the train.py as a .txt file as well, to show the few changes I had to make so far.

Any help would be greatly appreciated.

Thanks!

ValueError.txt train.py.txt

lab231 commented 9 months ago

Hi, I have no access to the containers. Also, the train.py is an empty file please make sure to reshare it again.

Regards

MahbubaAnnesha commented 9 months ago

Following are the tensorflow containers I've tried so far based on the requirements specified in README.md (opencv 4.4.1, tensorflow 2.9.1):

Here's the framework support matrix: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

docker pull nvcr.io/nvidia/tensorflow:22.09-tf2-py3 docker pull nvcr.io/nvidia/tensorflow:22.10-tf2-py3 docker pull nvcr.io/nvidia/tensorflow:22.11-tf2-py3 docker pull nvcr.io/nvidia/tensorflow:22.12-tf2-py3

Following are the rest of the commands I've used when trying any of these containers, for example for version 22.10:

docker run --gpus all -it -d -p 9910:9910 -v /home/mahbuba/Documents/Project/:/Documents/Project --name stroomnet2 nvcr.io/nvidia/tensorflow:22.10-tf2-py3

docker exec -it stroomnet2 bash

From interactive terminal: apt-get update pip install jupyter -U && pip install jupyterlab jupyter lab --ip=0.0.0.0 --port=9910 --allow-root

After opening jupyterlab on browser (localhost:9910) in jupyterlab console: git clone https://github.com/lab231/ST-RoomNet.git pip install opencv-python==4.5.1.48 apt-get update && apt-get install libgl1

Got the dataset, training.mat, validation.mat and testing.mat from the repo: https://github.com/leVirve/lsun-room

Let me know if I'm missing anything. Really appreciate the help!

train.py.txt ValueError.txt

lab231 commented 9 months ago

Thank you for the clarification. I have checked the code and your code is perfect. The problem is from my side. In spatial_transformer.py line 212. the interp_method parameter in ProjectiveTransformer class must be changed from interp_method='nearest' to interp_method='bilinear' as 'nearest' cannot be used in the training mode as it is not differentiable however 'bilinear' is differentiable. I already edited spatial_transformer.py file. Now, the training should run without any problem. please make sure to download the spatial_transformer.py again and replace the one in your folder. In inference, it is preferable to use interp_method='nearest' to get the best results, however, interp_method='bilinear' will work but it will introduce few wrong labels in the lines between the walls.

MahbubaAnnesha commented 9 months ago

Thank you so much! Not getting the ValueError anymore. :)

lab231 / ST-RoomNet

ValueError: No gradients provided for any variable #2