jackd / template_ffd

Code for paper "Learning Free-Form Deformations for 3D Object Reconstruction"
80 stars 21 forks source link

Build the model #11

Closed howard84527 closed 6 years ago

howard84527 commented 6 years ago

Hi, Sorry to bother you again. I used "python create_paper_params.py" and there were many json files created in the model/params folder.However, its could't be trained. Can you tell me more detail about creating the json file ? Thank you very much.

jackd commented 6 years ago

yeah... there were a lot of parameter sets used in the paper. I'd just start with b_plane or e_plane (b is base version, e gives slightly better results with only slightly more complexity).

See uses of self.params in template_ffd_builder.py. It's usually used with a .get call so defaults are set in the same place as they are used.

What specifically prevents training? Errors? Or just no improved performance? Double check your inputs are being processed correctly (cd scripts; python vis_inputs.py). If that isn't working, check the chamfer loss is calculating things correctly.

howard84527 commented 6 years ago

I ran " python train.py b_plane -s 200000" and I got an error below:

Traceback (most recent call last): File "train.py", line 19, in train(args.model_id, max_steps=args.max_steps) File "train.py", line 7, in train builder.initialize_variables() File "/home/howard/3D-CNN/FFD-net/template_ffd/model/template_ffd_builder.py", line 183, in inialize_variables features, labels = self.get_train_inputs() File "/home/howard/3D-CNN/FFD-net/template_ffd/model/builder.py", line 203, in get_train_inputs return self.get_inputs(mode=tf.estimator.ModeKeys.TRAIN) File "/home/howard/3D-CNN/FFD-net/template_ffd/model/template_ffd_builder.py", line 402, in getnputs dataset = self.get_dataset(mode) File "/home/howard/3D-CNN/FFD-net/template_ffd/model/template_ffd_builder.py", line 398, in getataset example_ids, shuffle=shuffle, repeat=repeat, batch_size=batch_size) File "/home/howard/3D-CNN/FFD-net/template_ffd/model/template_ffd_builder.py", line 149, in getataset dataset = tf.data.Dataset.from_tensor_slices( AttributeError: 'module' object has no attribute 'data'

jackd commented 6 years ago

Ah, what version of tensorflow are you using? tf.data was moved from contrib in... 1.4 I think? I'm not sure how close the tf.contrib.data API was to the tf.data API - you could probably do most of it with the contrib version, but if you're able to upgrade your tf version that would probably be easiest.

howard84527 commented 6 years ago

I using tesorflow 1.2.0.I will try it on tf 1.4. I appreciate your kind assistance. Thank you very much.

howard84527 commented 6 years ago

Hi, I can compile it for training on tf 1.4 now but I get other error below: " tf_nndistance.so: cannot open shared object file: No such file or directory "

I also compiled the ./compile.sh and it showed " ./compile.sh: line 4: /bin/nvcc: No such file or directory g++: error: tf_nndistance_g.cu.o: No such file or directory"

I don't find the "tf_nndistance.so" and "tf_nndistance_g.cu.o" files in the "tf_nearest_neighbour" folder.

jackd commented 6 years ago

That issue is entirely related to building the nearest neighbour ops.

Have you installed cuda? training will be incredibly slow if not - I'd say borderline useless, so I'm not particularly inclined to set things up for non-cuda versions.

If you have installed CUDA, it sounds like an issue with finding nvcc (CUDA compiler)

The latest version of the repo changed the nvcc line to $CUDA_HOME/bin/nvcc. I'm guessing that's the one your using, in which case it sounds like $CUDA_HOME isn't set. It's normally /usr/local/cuda, which is normally just a link to /usr/local/cuda-X.Y (X.Y being version numbers). If you don't have the link, you can either create it or just redirect to the specific version of cuda you want to use.

Either set the variable:

export CUDA_HOME=/usr/local/cuda

or change the line in compile.sh to /usr/local/cuda/bin/nvcc

Note you'll probably have to build tensorflow from source, in which case you'll probably have to use the --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" option as explained here.

howard84527 commented 6 years ago

I can run the train now. Thank you so much.