google / aiyprojects-raspbian

API libraries, samples, and system images for AIY Projects (Voice Kit and Vision Kit)
https://aiyprojects.withgoogle.com/
Apache License 2.0
1.63k stars 694 forks source link

Retraining mobilenetv2 and compiling for vision bonnet #594

Open crobbiano opened 5 years ago

crobbiano commented 5 years ago

Hi,

I've been able to successfully retrain mobilenetv2 models using a modified version of the retrain.py script provided with the tensorflow for poets.

When attempting to optimize those graphs with optimize_for_inference.py I get warnings of the following type: W0321 21:40:41.069447 140397580433216 optimize_for_inference_lib.py:244] Didn't find expected Conv2D input to 'MobilenetV2/expanded_conv_16/depthwise/BatchNorm/FusedBatchNorm'

I am curious if using the graph transform tool would work better for this? If so, are there examples of doing so?

The real problem comes when I try to run the bonnet model compiler. I use the following command line: ./scripts/bonnet_model_compiler.par --frozen_graph_path=tf_files/pbmodels/retrained_graph_mobilenet_v2_1.0_224.pb --output_graph_path=tf_files/binaryprotos.usda_mobilenet_v2_1.0_224.binaryproto --input_tensor_name=input --output_tensor_names=final_result --input_tensor_size=224 --debug

and get the following seg fault: W0321 22:05:42.662239 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662446 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_1/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662486 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_2/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662509 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_3/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662527 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_4/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662545 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_5/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662561 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_6/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662578 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_7/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662594 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_8/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662610 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_9/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662627 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_10/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662643 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_11/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662660 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_12/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662681 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_13/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662697 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_14/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662712 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_15/add of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662727 7104 model_preprocessing_utils.cc:186] Input MobilenetV2/expanded_conv_16/project/BatchNorm/FusedBatchNorm of operator Conv has values that may be too large: graph missing ReLU? W0321 22:05:42.662752 7104 preprocess_mognet_model_for_myriad_main.cc:700] Graph has non-fatal issues: /tmp/tmps0XkUm/temp_graph.bp, errors=1 F0321 22:05:45.083605 7104 allocate_memory_on_myriad.cc:307] Check failed: DefragmentAndAllocate(op_output_tensor_name, signed_output_size_bytes, &output_tensor_offset, &move_ops_added_by_defragmentation, cg_proto) Not enough primary memory! Check failure stack trace: @ 0x68ae7f (unknown) @ 0x68b664 (unknown) @ 0x68d3e9 (unknown) @ 0x4883df (unknown) @ 0x48918f (unknown) @ 0x406a43 (unknown) @ 0x4033d9 (unknown) @ 0x729a01 (unknown) @ 0x4005a9 (unknown) SIGABRT received by PID 7104 (TID 7104) from PID 7104; I0321 22:05:45.084316 7104 process_state.cc:293] RAW: ExecuteFailureCallbacks() safe I0321 22:05:45.084333 7104 process_state.cc:1175] RAW: FailureSignalHandler(): starting unsafe phase I0321 22:05:45.084344 7104 coreutil.cc:276] RAW: Attempting to connect to coredump socket @core I0321 22:05:45.084382 7104 coreutil.cc:279] RAW: Failed to connect to coredump socket @core I0321 22:05:45.084408 7104 coreutil.cc:208] RAW: Attempting to dump core I0321 22:05:45.084735 7104 coreutil.cc:244] RAW: WriteCoreDumpWith returns: 0 W0321 22:05:45.084995 7104 process_state.cc:1213] --- CPU registers: --- W0321 22:05:45.085007 7104 process_state.cc:1213] r8=5c945f19 r9=7fffffffffffffff r10=8 r11=206 r12=7f533d263100 W0321 22:05:45.085015 7104 process_state.cc:1213] r13=7f533e1e0508 r14=1190890 r15=1190950 rdi=1bc0 rsi=1bc0 rbp=7ffd8c404d60 W0321 22:05:45.085024 7104 process_state.cc:1213] rbx=1190890 rdx=6 rax=0 rcx=716f3d rsp=7ffd8c404d60 rip=716f3d efl=206 W0321 22:05:45.085031 7104 process_state.cc:1213] cgf=2b000000000033 err=0 trp=0 msk=fffffffe10000000 cr2=0 I0321 22:05:45.085047 7104 process_state.cc:526] --- Memory map: --- I0321 22:05:45.085085 7104 process_state.cc:526] 00400000-00937000: /tmp/tmps0XkUm/tool_b.bin I0321 22:05:45.085103 7104 process_state.cc:526] 7ffd8c518000-7ffd8c51a000: [vdso] I0321 22:05:45.085108 7104 process_state.cc:526] ffffffffff600000-ffffffffff601000: [vsyscall] I0321 22:05:45.085120 7104 process_state.cc:297] RAW: ExecuteFailureCallbacks() unsafe F0321 22:05:45.083605 7104 allocate_memory_on_myriad.cc:307] Check failed: DefragmentAndAllocate(op_output_tensor_name, signed_output_size_bytes, &output_tensor_offset, &move_ops_added_by_defragmentation, cg_proto) Not enough primary memory! E0321 22:05:45.085134 7104 process_state.cc:679] RAW: Raising signal 6 with default behavior I0321 22:05:45.085140 7104 process_state.cc:1275] RAW: FailureSignalHandler() exiting

I saw in issue 403 https://github.com/google/aiyprojects-raspbian/issues/402#issuecomment-396874585 that there are mobilenetv2 models that have been released for the AIY vision bonnet. Are there any instructions for how these were (re)trained, optimized and compiled for the vision bonnet?

Many thanks, Chris