tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.43k stars 1.92k forks source link

Error in concat1D: rank of tensors[2] must be the same as the rank of the rest (1) #841

Closed iacovlev-pavel closed 4 years ago

iacovlev-pavel commented 5 years ago

TensorFlow.js version

tensorflow/tfjs-node-gpu: 0.1.19

tensorflowjs 0.6.4 Dependency versions: keras 2.2.2 tensorflow 1.11.0

Describe the problem or feature request

I want to retrain faster_rcnn_inception_v2_coco with my own data, and run the prediction with tfjs-node-gpu. If I download the model directly and use tfjs-converter it works fine.

If I try to train it pipeline.config.txt

Miniconda3\python.exe models\research\object_detection\model_main.py --pipeline_config_path="pipeline.config" --model_dir="training_output" --num_train_steps=200000 --sample_1_of_n_eval_examples=1 --alsologtostderr

then freeze the model with:

Miniconda3\python.exe models\research\object_detection\export_inference_graph.py --input_type image_tensor --pipeline_config_path "training_output/pipeline.config" --trained_checkpoint_prefix "training_output/model.ckpt-4621" --output_directory "exported_model"

and then convert it with tensorflowjs_converter:

Miniconda3\Scripts\tensorflowjs_converter.exe --input_format=tf_saved_model --output_node_names="detection_boxes,detection_scores,num_detections,detection_classes" --saved_model_tags=serve "exported_model/saved_model" "exported_model/web_model"

I get the following error inNodeJS: Error in concat1D: rank of tensors[2] must be the same as the rank of the rest (1) when running the following code:

const model = await tf.loadFrozenModel(modelPath, weightsPath);

const shape = [1, 2560, 1920, 3];
const tensor = tf.fill(shape, 0, 'int32');
await model.executeAsync(
  { image_tensor: tensor },
  ['detection_boxes', 'detection_scores', 'detection_classes', 'num_detections'],
);

A side note, if I take the trained frozen model and try to run the prediction in Python it works.

(node:13376) UnhandledPromiseRejectionWarning: Error: Error in concat1D: rank of tensors[2] must be the same as the rank of the rest (1)
    at Object.assert (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-core\dist\util.js:40:15)
    at D:\Projects\jsblur\node_modules\@tensorflow\tfjs-core\dist\ops\concat_util.js:7:14
    at Array.forEach (<anonymous>)
    at Object.assertParamsConsistent (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-core\dist\ops\concat_util.js:6:12)
    at concat_ (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-core\dist\ops\concat_split.js:36:19)
    at Object.concat (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-core\dist\ops\operation.js:23:29)
    at Object.exports.executeOp (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-converter\dist\src\operations\executors\slice_join_executor.js:10:25)
    at Object.executeOp (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-converter\dist\src\operations\operation_executor.js:47:30)
    at _loop_1 (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-converter\dist\src\executor\graph_executor.js:258:52)
    at GraphExecutor.processStack (D:\Projects\jsblur\node_modules\@tensorflow\tfjs-converter\dist\src\executor\graph_executor.js:282:13)
pyu10055 commented 5 years ago

@iacovlev-pavel looks like you are trying to freeze the model first, can you confirm that step still generates the saved_model format? In fact the tensorflowjs_converter does the graph freeze automatically, you should not need the middle step.

rthadur commented 5 years ago

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

sayradley commented 5 years ago

I run into the same problem. I haven't spent much time debugging, but it looks like the concat_ from @tensorflow/tfjs-core/dist/ops/concat_split.js:22 receives an empty tensor.

mmmaks2004 commented 5 years ago

I also have such a problem. Found a solution?

vergilijus commented 5 years ago

Same problem. This link might be helpful. tldr: That error was happening in the post processing node. You can follow instruction here and cut post processing node, then do post processing manually.

mmmaks2004 commented 5 years ago

vergilijus, thanks, this solved my problem

amr-elsehemy commented 5 years ago

I'm facing the same problem now. I've a custom trained model based on mobileNET SSD which works perfectly with python, I converted it with the tensorflowjs_converter, conversion succeeds, but when I try to test it using tensorflow nodejs and call await model.executeAsync(MY_INPUT), it fails with the the same error above,

appreciate if someone can explain why this error happens and what's the proposed solution (knowing that I only have the frozen model format) .

mmmaks2004 commented 5 years ago

amr-elsehemy, This link might be helpful: https://stackoverflow.com/questions/53675183/error-in-concat1d-rank-of-tensors23-must-be-the-same-as-the-rank-of-the-rest/53848092#53848092

iacovlev-pavel commented 5 years ago

@pyu10055

Using TensorFlow backend.
2019-02-12 10:53:30.911984: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-02-12 10:53:31.083098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB
2019-02-12 10:53:31.089550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-12 10:53:31.837913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-12 10:53:31.841518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-02-12 10:53:31.844130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-02-12 10:53:31.846481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6363 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-02-12 10:53:48.454597: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: graph_to_optimize
2019-02-12 10:53:48.458831: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   debug_stripper: Graph size after: 3187 nodes (0), 4011 edges (0), time = 86.593ms.
2019-02-12 10:53:48.462892: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   model_pruner: Graph size after: 2651 nodes (-536), 3475 edges (-536), time = 216.454ms.
2019-02-12 10:53:48.467751: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 2481 nodes (-170), 3279 edges (-196), time = 1518.22095ms.
2019-02-12 10:53:48.471927: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   arithmetic_optimizer: Graph size after: 1896 nodes (-585), 2897 edges (-382), time = 1271.00903ms.
2019-02-12 10:53:48.476301: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   dependency_optimizer: Graph size after: 1833 nodes (-63), 2793 edges (-104), time = 131.237ms.
2019-02-12 10:53:48.480824: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   model_pruner: Graph size after: 1833 nodes (0), 2793 edges (0), time = 102.762ms.
2019-02-12 10:53:48.485069: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   remapper: Graph size after: 2561 nodes (728), 3625 edges (832), time = 409.052ms.
2019-02-12 10:53:48.488797: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 1521 nodes (-1040), 2481 edges (-1144), time = 1958.90601ms.
2019-02-12 10:53:48.493442: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   arithmetic_optimizer: Graph size after: 1583 nodes (62), 2605 edges (124), time = 966.277ms.
2019-02-12 10:53:48.498022: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   dependency_optimizer: Graph size after: 1506 nodes (-77), 2450 edges (-155), time = 120.647ms.
2019-02-12 10:53:48.502621: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   debug_stripper: Graph size after: 1506 nodes (0), 2450 edges (0), time = 96.263ms.
2019-02-12 10:53:48.506421: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   model_pruner: Graph size after: 1506 nodes (0), 2450 edges (0), time = 100.812ms.
2019-02-12 10:53:48.510470: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 1498 nodes (-8), 2442 edges (-8), time = 1017.92499ms.
2019-02-12 10:53:48.515238: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   arithmetic_optimizer: Graph size after: 1498 nodes (0), 2442 edges (0), time = 699.736ms.
2019-02-12 10:53:48.519555: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   dependency_optimizer: Graph size after: 1498 nodes (0), 2442 edges (0), time = 120.128ms.
2019-02-12 10:53:48.523521: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   model_pruner: Graph size after: 1498 nodes (0), 2442 edges (0), time = 100.779ms.
2019-02-12 10:53:48.527581: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   remapper: Graph size after: 1498 nodes (0), 2442 edges (0), time = 99.55ms.
2019-02-12 10:53:48.531593: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 1498 nodes (0), 2442 edges (0), time = 597.137ms.
2019-02-12 10:53:48.535677: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   arithmetic_optimizer: Graph size after: 1498 nodes (0), 2442 edges (0), time = 694.477ms.
2019-02-12 10:53:48.539505: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   dependency_optimizer: Graph size after: 1498 nodes (0), 2442 edges (0), time = 117.965ms.
Writing weight file F:\Projects\pk-next-train\models\faster_rcnn_resnet101_coco_2019_02_01\web_model\tensorflowjs_model.pb...
2019-02-12 10:53:49.392958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-12 10:53:49.395246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-12 10:53:49.399047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-02-12 10:53:49.401320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-02-12 10:53:49.403427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6363 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)

The above comments mention that the post processing step needs to be removed, how would I do that in the faster_rcnn_resnet101 model since the https://github.com/tensorflow/tfjs-models/tree/master/coco-ssd#technical-details-for-advanced-users refers to a SSD model.

If I use the model directly from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md (without training), everything works.

mmmaks2004 commented 5 years ago

iacovlev-pavel , does this run without an error on your model? tensorflowjs_converter --input_format=tf_saved_model \ --output_node_names='Postprocessor/ExpandDims_1,Postprocessor/Slice' \ --saved_model_tags=serve \ ./saved_model \ ./web_model

iacovlev-pavel commented 5 years ago

@mmmaks2004 Yes, this is the result: AssertionError: Postprocessor/ExpandDims_1 is not in graph.

mmmaks2004 commented 5 years ago

iacovlev-pavel, graph names without quotes typed?

iacovlev-pavel commented 5 years ago

@mmmaks2004

Here is the command:

tensorflowjs_converter --input_format=tf_saved_model --output_node_names="Postprocessor/ExpandDims_1,Postprocessor/Slice" --saved_model_tags=serve "models\faster_rcnn_resnet101_coco_2019_02_01\saved_model\saved_model" "models\faster_rcnn_resnet101_coco_2019_02_01\web_model"

AssertionError: Postprocessor/ExpandDims_1 is not in graph seems right, since this is not a SSD model but a faster_rcnn. There is no Postprocessor only SecondStagePostprocessor.

msektrier commented 5 years ago

We have exactly the same problem with a transformed ssdlite mobilenet2 network. model.zip

We trained it with TF 1.13.1 and used the last TFJS converter 1.1.2. The transformation is successful, the signature is fine as well using the saved_model_cli tool.

When we want to apply it we receive this error:

image

Just a note: "output_node_names" is not existing anymore in tfjs converter.

nsthorat commented 5 years ago

@pyu10055 can you take a look?

msektrier commented 5 years ago

Can we provide further information to this issue or are there any updates?

pyu10055 commented 5 years ago

@msektrier sorry for the delay, can you provide the full model including the weight files in order to reproduce the issue. thanks.

hsparrow commented 4 years ago

@pyu10055 Hi, I also got the same error Error: Error in concat1D: rank of tensors[6] must be the same as the rank of the rest (1) when loading the converted model. Here is the configuration: Version: tfjs-converter@1.3.1 tfjs-core@1.3.1 Conversion: tensorflowjs_converter --input_format=tf_saved_mode --saved_model_tags=serve ./saved_model ./web_model Screenshot: error Code:

import {loadGraphModel} from '@tensorflow/tfjs-converter';
modelPromise = loadGraphModel(MODEL_URL);
runButton.onclick = async () => {
    const model = await modelPromise;        //index.js:44
    console.log('model loaded');
    console.time('predict1'); 
    const pixels = tf.browser.fromPixels(image);
    // const res1 = await model.executeAsync(pixels.reshape([1, ...pixels.shape]));
    // res1.map(t => t.dataSync());
    // const res2 = await model.executeAsync(pixels.reshape([1, ...pixels.shape]));
    const res2 = await model.executeAsync(
        {'image_tensor': pixels.reshape([1, ...pixels.shape])},
        [ 'detection_boxes','detection_scores','detection_classes','num_detections']);
    const count = res2[3].dataSync()[0];
    const boxes = res2[0].dataSync();
    const scores = res2[1].dataSync();
    const classes = res2[2].dataSync();
       ...
}

I searched a lot that most people refer to remove the postprocessing steps from the exported model. However, the latest tensorflow_converter has deprecated the output_node_names. Is there any way to solve this without removing the postprocessing part? Thanks!

pyu10055 commented 4 years ago

@hsparrow did you retrain the ssd model? If so can you share the saved model directory with me? It would be easier to debug the issue, thanks.

hsparrow commented 4 years ago

@hsparrow did you retrain the ssd model? If so can you share the saved model directory with me? It would be easier to debug the issue, thanks.

@pyu10055 Yes! I retrained the ssd model on a new dataset. Since the saved model directory is larger than what Github allows, I shared it on google drive: model.

Here are some updates: I change the module import from js to script import in html as follows:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"> </script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>

The error remains the same, however, it occurs when executing the detection instead of loading model:

modelPromise = tf.loadGraphModel(MODEL_URL);
const model = await modelPromise;
const pixels = await tf.browser.fromPixels(image);
const res = await model.executeAsync(
            {'image_tensor': pixels.reshape([1, ...pixels.shape]},
            [ 'detection_boxes','detection_scores','detection_classes','num_detections']
        );        // error occurs here
pyu10055 commented 4 years ago

@hsparrow Thank you for providing the model. It looks like one of the constant shape is wrong for the concat1D op, and it should be converted directly from the TensorFlow saved model. To see if we have any issues with the conversion, can you provide the TensorFlow saved model as well? Thanks

hsparrow commented 4 years ago

@pyu10055 Hi, for Tensorflow saved model, do you mean the exported inference graphs or the saved ones during training? Here is the exported inference graphs, and this is the saved ones during training.

hsparrow commented 4 years ago

@pyu10055 Hey, it seems this issue has been fixed. How can I fix my problem? Do I need to re-convert the model?

xusongpei commented 4 years ago

@pyu10055 I also have the same issue with @hsparrow, I don't know how to solve it. and I'm waiting for a method to solve it.

pyu10055 commented 4 years ago

@hsparrow and @xusongpei THe issue you are facing should be fixed in the next release of tfjs-converter. You don't need to convert the model again, the fixes are on the javascript side.

xusongpei commented 4 years ago

@pyu10055 Thank you for that!