nnstreamer-preprocessor / nnstreamer

4 stars 0 forks source link

nnstreamer model 환경 구축 및 Squeezenet 학습 #8

Open yura1h opened 4 years ago

yura1h commented 4 years ago

목표: nnstreamer simple example(mobile_ssd_v2_coco.tflite) test 및 squeezenet 모델 학습 결과물 : nnstreamer 우분투 환경 구축과 build issue, dependency issue 해결, nnstreamer올라갈 수 있는 squeezenet 모델 due date: by 03/08

yura1h commented 4 years ago

##아래와 같은 Error 발생으로 모델학습에 어려움을 겪고 있습니다. tensorflow 버전 문제인지 cudnn 문제인지ㅠ 설치환경 바꿔서 돌려보는데 잘 안되서 조언 부탁드립니다##

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train_squeezenet.py", line 182, in train(sq_net,lr_rate,max_iter,out_classes,batch_size,tr_data_files,tr_labels,cv_data_files,cv_labels,log_file) File "train_squeezenet.py", line 110, in train sess.run([sq_net.v0_opt,sq_net.v0_res_opt,sq_net.v1_opt],feed_dict={sq_net.inputs:batch_images,sq_net.labels:batch_labels,sq_net.lr_rate:lr_rate}) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 960, in run run_metadata_ptr) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run feed_dict_tensor, options, run_metadata) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run run_metadata) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node squeezenet_v0/conv1/conv2d/Conv2D (defined at /home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py:38) ]]

Errors may have originated from an input operation. Input Source operations connected to node squeezenet_v0/conv1/conv2d/Conv2D: Placeholder (defined at /home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py:51)

Original stack trace for 'squeezenet_v0/conv1/conv2d/Conv2D': File "train_squeezenet.py", line 180, in sq_net = SqueezeNet(input_shape,out_classes,lr_rate,is_train) File "/home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py", line 53, in init self.loss_v0,self.loss_v0_res,self.loss_v1 = self.model_loss(self.inputs,self.labels,train) File "/home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py", line 162, in model_loss logits_v0 = self.model_arc_v0(inputs,train) File "/home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py", line 60, in model_arc_v0 conv1 = general_conv(inputs,filters=96,kernel=7,stride=2,padding="SAME",name="conv1",relu=True,weight="Xavier") File "/home/modulabs-04/AIcollege/ondevicemodel/SqueezeNet/squeezenet_model.py", line 38, in general_conv conv = tf.layers.conv2d(inputs,filters,kernel,stride,padding,kernel_initializer=w_init) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/convolutional.py", line 424, in conv2d return layer.apply(inputs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, *kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1672, in apply return self.call(inputs, args, kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/base.py", line 547, in call outputs = super(Layer, self).call(inputs, *args, kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 778, in call outputs = call_fn(cast_inputs, *args, *kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper return converted_call(f, args, kwargs, options=options) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 459, in converted_call return _call_unconverted(f, args, kwargs, options) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted return f(args, kwargs) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 209, in call outputs = self._convolution_op(inputs, self.kernel) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1135, in call return self.conv_op(inp, filter) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 640, in call return self.call(inp, filter) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 239, in call name=self.name) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2011, in conv2d name=name) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 969, in conv2d data_format=data_format, dilations=dilations, name=name) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 742, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3322, in _create_op_internal op_def=op_def) File "/home/modulabs-04/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in init self._traceback = tf_stack.extract_stack()

hayleyshim commented 4 years ago

저 위의 cuda 등 환경세팅과 관해 기존에 설치된거 밀고 아까 알려드린 람다스택으로 다시 환경 세팅하고 그 위에서 모델 학습해보셨나요 *참고 : 람다스택 https://lambdalabs.com/lambda-stack-deep-learning-software

tf 버전문제, cudnn 문제 등등 언급해주신것들이 환경세팅과 관련된거같아 람다스택으로 한번에 세팅해서 그위에서 모델 학습 시켜보는게 가장 빠르지않을까싶어요.

hayleyshim commented 4 years ago

Errors may have originated from an input operation. Input Source operations connected to node squeezenet_v0/conv1/conv2d/Conv2D:

이쪽 부분과 관련해서 비슷한 에러를 찾아보면 https://github.com/tensorflow/tensorflow/issues/24650 이렇게 나오는데 tf version 문제일까요

ddeokho commented 4 years ago

Screenshot from 2020-03-08 11-16-59

저도 예전에 설치할 때 버전 조건을 만족시키지 못해 꽤 고생했었는데 TF 2.1 이상 쓰려면 cuDNN을 7.6이상 써야할 거 같아요.

https://www.tensorflow.org/install/gpu

그리고 cudnn에 cuda 10.2 를 지원하는 게 없네요. cuda 버전도 낮춰야 할 거 같아요 https://developer.nvidia.com/rdp/cudnn-archive

jwkanggist commented 4 years ago

다른 이슈에서도 언급드렸지만 이미 학습되서 제공되는 모델로 먼저 연결해서 전체가 돌아가는지 확인하시는걸 추천드립니다

https://github.com/nnstreamer-preprocessor/nnstreamer/issues/5#issuecomment-596180448

그리고 그와 동일한 모델을 학습하는 파이프라인을 만들고 대체해서 같은 점수가 나오는지 확인하고 모델 개선해나가는 방향으로 진행하시는걸 추천합니다 @H-YURA