Build error from src/AutoDefRuntime, runtime error from Tensorflow

ericchen321 commented 2 years ago

Hello Lawson,

We're attempting to reproduce results from your 2019 paper. While trying to build and run the code, we got some errors and we're wondering if you can offer some insight?

Error from building the main project. While building the main project

# Now build the main project
cd src/AutoDefRuntime
mkdir build && cd build
cmake .. && make -j8

We got this error:

[100%] Built target igl_opengl2
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp: In instantiation of 'void GPLCTimeStepper<ReducedSpaceType, MatrixType>::step(const Eigen::SparseVector<double, 0, int>&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>]':
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1435:13:   required from 'run_sim(ReducedSpaceType*, const json&, const boost::filesystem::path&)::<lambda(igl::viewer::Viewer&)> [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>]'
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1543:85:   required from 'struct run_sim(ReducedSpaceType*, const json&, const boost::filesystem::path&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>; json = nlohmann::basic_json<>]::<lambda(class igl::viewer::Viewer&)>'
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1308:30:   required from 'void run_sim(ReducedSpaceType*, const json&, const boost::filesystem::path&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>; json = nlohmann::basic_json<>]'
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1910:74:   required from here
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:797:23: error: no matching function for call to 'LBFGSpp::LBFGSSolver<double>::minimizeWithPreconditioner(GPLCObjective<ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >&, Eigen::VectorXd&, double&, std::conditional<true, Eigen::LDLT<Eigen::Matrix<double, -1, -1>, 1>, Eigen::SimplicialLDLT<Eigen::SparseMatrix<double> > >::type&)'
             niter = m_solver->minimizeWithPreconditioner(*m_gplc_objective, z_param, min_val_res, m_H_solver_pardiso);
In file included from /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:86:0:
/home/eric/cpsc548/project/AutoDef/extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h:199:16: note: candidate: template<class Foo, class Matrix, class Solver> int LBFGSpp::LBFGSSolver<Scalar>::minimizeWithPreconditioner(Foo&, LBFGSpp::LBFGSSolver<Scalar>::Vector&, Scalar&, Matrix&, Solver&) [with Foo = Foo; Matrix = Matrix; Solver = Solver; Scalar = double]
 inline int minimizeWithPreconditioner(Foo& f, Vector& x, Scalar& fx, Matrix &preconditioner, Solver &solver)
            ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/eric/cpsc548/project/AutoDef/extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h:199:16: note:   template argument deduction/substitution failed:
/home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:797:23: note:   candidate expects 5 arguments, 4 provided

The error makes sense: minimizeWithPreconditioner() in extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h requires five arguments, but calls to this function from main.cpp provided only four. The missing one seems to be Matrix &preconditioner. I wonder if you had the same issue before and was able to solve it?

Error from training. While running the training script

./scripts/unified_gen_and_train.py configs/X.json models/X

we got this error:

Traceback (most recent call last):
File "./scripts/unified_gen_and_train.py", line 2, in <module>
import keras.backend as K
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/__init__.py", line 4, in <module>
from . import activations
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/activations.py", line 6, in <module>
from .engine import Layer
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/engine/__init__.py", line 8, in <module>
from .training import Model
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/engine/training.py", line 25, in <module>
from .. import callbacks as cbks
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/callbacks.py", line 26, in <module>
from tensorflow.contrib.tensorboard.plugins import projector
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/__init__.py", line 35, in <module>
from tensorflow.contrib import cudnn_rnn
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/__init__.py", line 34, in <module>
from tensorflow.contrib.cudnn_rnn.python.layers import *
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/__init__.py", line 23, in <module>
from tensorflow.contrib.cudnn_rnn.python.layers.cudnn_rnn import *
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 20, in <module>
from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 22, in <module>
from tensorflow.contrib.rnn.python.ops import lstm_ops
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/__init__.py", line 88, in <module>
from tensorflow.contrib.rnn.python.ops.gru_ops import *
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/gru_ops.py", line 33, in <module>
resource_loader.get_path_to_datafile("_gru_ops.so"))
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/util/loader.py", line 56, in load_op_library
ret = load_library.load_op_library(path)
File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/_gru_ops.so: undefined symbol: _ZN15stream_executor6Stream12ThenBlasGemmENS_4blas9TransposeES2_yyyfRKNS_12DeviceMemoryIfEEiS6_ifPS4_i

We looked it up, and we read that some posts suggested the error being due to the specific version of Tensorflow. However our Tensorflow was forked from your repo, commit 78cdaf5 from the master branch. So we're wondering if you also had this issue before and was able to resolve it?

May I ask what CUDA version you used to run the code?

Thanks in advance for looking into this long inquiry. We would appreciate your help very much.

lawsonfulton commented 2 years ago

Hey Eric,

Unfortunately I have not been maintaining this repo since 2019, so it’s possible the dependencies have shifted since then and I didn’t pin some to a fixed version.

As for issue 1., you could try using an identity preconditioner and see if that works.

For 2., I’m not so sure. It sounds like it could be an issue with the tensorflow installation.. you might want to try upgrading to a more recent version if possible. I’m not sure how easy that will be.

And 3., the computer I used for this project is long gone. If the cuda version is not listed anywhere then I’m sorry I can’t help you.

I wish I could be of more help. There was another group that recreated my results in a paper earlier this year. I think it was this one: https://arxiv.org/abs/2102.11026 You could try reaching out to them for advice?

If you do figure it out—I’d love to know how you get it working so I can update the repo!

By the way—I’m curious what your project is if you don’t mind me asking?

Good luck!

-Lawson

On Tue, Dec 7, 2021 at 9:19 PM ericchen321 @.***> wrote:

Hello Lawson,

We're attempting to reproduce results from your 2019 paper https://www.dgp.toronto.edu/projects/latent-space-dynamics/. While trying to build and run the code, we got some errors and we're wondering if you can offer some insight?

Error from building the main project. While building the main project

Now build the main project

cd src/AutoDefRuntime mkdir build && cd build cmake .. && make -j8

We got this error:

[100%] Built target igl_opengl2 /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp: In instantiation of 'void GPLCTimeStepper<ReducedSpaceType, MatrixType>::step(const Eigen::SparseVector<double, 0, int>&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>]': /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1435:13: required from 'run_sim(ReducedSpaceType, const json&, const boost::filesystem::path&)::<lambda(igl::viewer::Viewer&)> [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>]' /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1543:85: required from 'struct run_sim(ReducedSpaceType, const json&, const boost::filesystem::path&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>; json = nlohmann::basic_json<>]::<lambda(class igl::viewer::Viewer&)>' /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1308:30: required from 'void run_sim(ReducedSpaceType, const json&, const boost::filesystem::path&) [with ReducedSpaceType = ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >; MatrixType = Eigen::Matrix<double, -1, -1>; json = nlohmann::basic_json<>]' /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:1910:74: required from here /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:797:23: error: no matching function for call to 'LBFGSpp::LBFGSSolver::minimizeWithPreconditioner(GPLCObjective<ReducedSpace<LinearSpaceImpl<Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >, Eigen::Matrix<double, -1, -1> >&, Eigen::VectorXd&, double&, std::conditional<true, Eigen::LDLT<Eigen::Matrix<double, -1, -1>, 1>, Eigen::SimplicialLDLT<Eigen::SparseMatrix > >::type&)' niter = m_solver->minimizeWithPreconditioner(m_gplc_objective, z_param, min_val_res, m_H_solver_pardiso); In file included from /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:86:0: /home/eric/cpsc548/project/AutoDef/extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h:199:16: note: candidate: template<class Foo, class Matrix, class Solver> int LBFGSpp::LBFGSSolver::minimizeWithPreconditioner(Foo&, LBFGSpp::LBFGSSolver::Vector&, Scalar&, Matrix&, Solver&) [with Foo = Foo; Matrix = Matrix; Solver = Solver; Scalar = double] inline int minimizeWithPreconditioner(Foo& f, Vector& x, Scalar& fx, Matrix &preconditioner, Solver &solver) ^~~~~~ /home/eric/cpsc548/project/AutoDef/extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h:199:16: note: template argument deduction/substitution failed: /home/eric/cpsc548/project/AutoDef/src/AutoDefRuntime/src/main.cpp:797:23: note: candidate expects 5 arguments, 4 provided

The error makes sense: minimizeWithPreconditioner() in extern/GAUSS/ThirdParty/LBFGS++/include/LBFGS.h requires five arguments, but calls to this function from main.cpp provided only four. The missing one seems to be Matrix &preconditioner. I wonder if you had the same issue before and was able to solve it?

Error from training. While running the training script

./scripts/unified_gen_and_train.py configs/X.json models/X

we got this error:

Traceback (most recent call last): File "./scripts/unified_gen_and_train.py", line 2, in import keras.backend as K File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/init.py", line 4, in from . import activations File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/activations.py", line 6, in from .engine import Layer File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/engine/init.py", line 8, in from .training import Model File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/engine/training.py", line 25, in from .. import callbacks as cbks File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/keras/callbacks.py", line 26, in from tensorflow.contrib.tensorboard.plugins import projector File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/init.py", line 35, in from tensorflow.contrib import cudnn_rnn File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/init.py", line 34, in from tensorflow.contrib.cudnn_rnn.python.layers import File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/init.py", line 23, in from tensorflow.contrib.cudnn_rnn.python.layers.cudnn_rnn import File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 20, in from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 22, in from tensorflow.contrib.rnn.python.ops import lstm_ops File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/init.py", line 88, in from tensorflow.contrib.rnn.python.ops.gru_ops import * File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/gru_ops.py", line 33, in resource_loader.get_path_to_datafile("_gru_ops.so")) File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/util/loader.py", line 56, in load_op_library ret = load_library.load_op_library(path) File "/home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /home/eric/cpsc548/project/AutoDef/extern/anaconda/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/_gru_ops.so: undefined symbol: _ZN15stream_executor6Stream12ThenBlasGemmENS_4blas9TransposeES2_yyyfRKNS_12DeviceMemoryIfEEiS6_ifPS4_i

We looked it up, and we read that some posts suggested the error being due to the specific version of Tensorflow. However our Tensorflow https://github.com/ericchen321/tensorflow was forked from your repo, commit 78cdaf5 from the master branch. So we're wondering if you also had this issue before and was able to resolve it?

May I ask what CUDA version you used to run the code?

Thanks in advance for looking into this long inquiry. We would appreciate your help very much.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lawsonfulton/AutoDef/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANWW3URJO3WRVHKI4SHZETUP26BJANCNFSM5JSVTKWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ericchen321 commented 2 years ago

Hi Lawson,

Thank you so much for the quick and detailed reply! And thanks for pointing us to that paper; I'll look into it to see if I can get some clues.

The project I'm working on is the term project for a Master's course in computer graphics at UBC. Our aim is to reproduce results from the paper and implement a VAE/GAN in place of the autoencoder. The prof teaching the course is Dinesh Pai and he was David Levin's PhD supervisor.

lawsonfulton commented 2 years ago

Sorry I didn't reply—sounds like cool stuff!

Did you ever manage to get it working?

On Tue, Dec 7, 2021 at 11:56 PM ericchen321 @.***> wrote:

Hi Lawson,

Thank you so much for the quick and detailed reply! And thanks for pointing us to that paper; I'll look into it to see if I can get some clues.

The project I'm working on is the term project for a Master's course in computer graphics at UBC. Our aim is to reproduce results from the paper and implement a VAE/GAN in place of the autoencoder. The prof teaching the course is Dinesh Pai https://sensorimotor.cs.ubc.ca/pai/ and he was David Levin's PhD supervisor.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lawsonfulton/AutoDef/issues/6#issuecomment-988498657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANWW3VYY3SPOJW7PT2UI3TUP3QPNANCNFSM5JSVTKWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ericchen321 commented 2 years ago

Hey Lawson, sorry about the late reply from our side as well...

Yes we got it working finally! Eventually we got everything working with Tensorflow 1.8 + CUDA 9.2.

You can take a look at our final project report.

lawsonfulton / AutoDef

Build error from src/AutoDefRuntime, runtime error from Tensorflow #6

Now build the main project