Closed vladmandic closed 4 years ago
This issue turned out to be quite the rabbit hole. First, the error has nothing to do with float64 values being used anywhere in the model - that's a red herring. The problem occurs because TensorflowJS (-converter) cannot handle 0-rank (string-) tensors (aka scalars) at the moment and thus produces an invalid weight manifest for models that contain them.
In short, this is an error with the TensorflowJS converter that I cannot fix in this converter (I could submit a PR to the TFJS team, but I don't know how they handle 0-rank tensors internally 0-rank tensors are handled just fine outside the converter).
To be perfectly honest, though, I don't really see a reason to fix this at the moment, since the model in question is a TF model to begin with and isn't even supported by TFJS, which brings me to the second issue.
The only way to even generate the (broken) TFJS graph model is to use the --skip_op_checks
flag, which turns off model validation and allows converting models that contain unsupported features (here: HashTableV2 and LookupTableFindV2).
Looking at the error location in the tensorflowjs package reveals that this problem occurs when tensorflowjs is trying to determine the size of a tensor given its shape [1] using numpy.prod().
The documentation reveals, that
The product of an empty array is the neutral element 1:
>>>numpy.prod([]) 1.0
This solves the mystery of our TypeError and the mention of numpy.float64
- the shape attribute of the tensor is empty, resulting in a float scalar instead of an integer size. Examining the model.json
reveals that there are indeed quite a few tensors with empty shapes:
> grep -o '"shape": \[\],' model.json | wc -l
4358
The TensorflowJS converter loads the model graph, runs some optimisations on it and finally removes all tensor data (e.g. constants and model weights) before saving it as JSON.
The removed tensor data is stored separately and in chunks. During this process, the converter turns TF tensors into numpy arrays and something interesting happens.
When writing the weights manifest [2], the converter writes the shape of the tensor by using ndarray's shape
-attribute [3]. So far so good, but why is the shape of the ndarray empty? Let's see what numpy's documentation [4] has to say about it:
In general, numerical data arranged in an array-like structure in Python can be converted to arrays through the use of the array() function. [...] A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).
So let's do exactly that:
>>>import numpy as np
>>>a = np.array([])
>>>a.shape
(0,)
Hm, close, but no cigar. Passing an empty array doesn't work, since it will generate a 1-d array of size zero, which is different from an empty shape. What about numerical data that is not arranged in an array-like structure in Python, like, say a scalar value?
>>>a = np.array(42)
>>>a.shape
()
>>>a.size
1
>>>a.item(-1)
42
Bingo! That's it - if we pass a scalar (i.e. a rank-0 tensor) into numpy, the resulting object has an empty shape attached to it while still holding the given value. This means that models which contain rank-0 tensors won't convert properly to TFJS and generate broken output that cannot be read by the TFJS Python library (though it's handled correctly in TFJS and could be used in the browser or with nodejs).
Given a model that's actually supported by TJFS 😉, the converter would work correctly, if we'd change the offending line [5] as follows:
for _ in range(int(size)):
Why not avoid writing empty shape
-attributes in the weight manifest in the first place? Well, technically any changes there would be wrong, since a shape of (1,) would be a rank-1 tensor and not a scalar. (0,) wouldn't help either, since the tensor isn't empty. So an empty shape is the correct way to express this, it just needs to be handled correctly by the TFJS converter. Interestingly enough, numeric scalars are decoded correctly and the problem only occurs with strings.
This is where our little journey ends, since applying this little fix only opens another trapdoor:
...
File "███████████████/tensorflowjs/read_weights.py", line 186, in decode_weights
raise NotImplementedError('Unsupported data type: %s' % dtype)
NotImplementedError: Unsupported data type: bool
Turns out, while TFJS supports bool
, their Python converter doesn't. Now while that would be a worthwhile fix, the model in question uses a uint64
output, which is not supported by TFJS...
Great analysis!
Btw, originally I did have issues with openimages-ssd-mobilenet-v2
model in TFJS (after converting it to tfjs_graph_model
).
Those issues are solved in https://github.com/tensorflow/tfjs/issues/3823 (committed to master, not yet in a release)
And the model works perfectly (and accurately) in TFJS - I cannot understand how if uint64
is not supported in TFJS.
And the model works perfectly (and accurately) in TFJS - I cannot understand how if uint64 is not supported in TFJS.
Well, you've said it yourself:
Those issues are solved in tensorflow/tfjs#3823 (committed to master, not yet in a release)
I only test things from the point of view of a user who installed this utility via pip. Anything that hasn't been released just yet, I cannot test or consider. This means that even if I fix things on my end, anyone who does a pip install -U tfjs_graph_converter
won't benefit from those fixes as long as the TFJS team hasn't released their changes.
To confirm this, I compared the current git version to the latest (as of time of writing this) v2.3.0 and found that the converter has indeed been updated. So until the tensorflowjs team drops another release, there's little I can do.
The previous issue with the model was due to model pushing pushing weight tensor into TensorArray which then gets disposed, fix was to keep track of it instead of disposing.
Regarding uint64
, my guess is that it's just a choice and model values don't really need it - it gets clipped and still works just fine.
But totally understand this is outside of the scope of what can be done here.
I have a somewhat "similar" problem trying to convert the toxicity model: https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1 I wonder if this is related `tfjs_graph_converter tftox model.pb TensorFlow.js Graph Model Converter
Graph model: tftox Output: model.pb Target format: tf_frozen_model
Converting.... Error: Input 1 of node module_apply_default/Encoder_en/KonaTransformer/ClipToMaxLength/Less was passed int32 from Const_56:0 incompatible with expected int64.`
@elivasquezhdz I opened a new issue for this problem. I assume you meant this model: toxicity.
The problem here is that indeed the TFJS model holds all its intXX
tensors converted to int32
, while the actual processing nodes in the graph still expect int64
input. This is strictly speaking an error with TFJS as they simply adjust the data type during import.
TF, however, doesn't do any such magic tricks behind the scenes and processes graph nodes and their type annotations as-is. I'll consider this to be just another quirk and will add a work-around.
Yes! that's the one I meant, thanks for opening the issue and letting me know!
@elivasquezhdz The toxicity-model issue is fixed now. I also ported the TFJS inference code to python for testing the results. If you're interested in that let me know and I'll add a repo for it.
@elivasquezhdz The toxicity-model issue is fixed now. I also ported the TFJS inference code to python for testing the results. If you're interested in that let me know and I'll add a repo for it.
Can you post a link to your patch? I'm having a similar issue where a model expects uint8
and TFJS passes int32
and i'd like to verify before opening another issue.
@vladmandic Maybe I should clarify this a little. I only ported the code for using the toxicity-model from TypeScript to Python in order to verify that the converted model still works as intended.
I didn't touch any of the TFJS innards, just ported the model specific code to test it out.
@vladmandic Maybe I should clarify this a little. I only ported the code for using the toxicity-model from TypeScript to Python in order to verify that the converted model still works as intended.
I didn't touch any of the TFJS innards, just ported the model specific code to test it out.
Thanks for clarification.
I've created an issue on TFJS to track uint8
stuff - basically, TFJS maps any int
value to int32
during load and if model is strict, it will later fail on execution.
I fixed the Python converter in TFJS and the changes got merged into the master branch. The next TFJS release will contain the fix.
As subject line says, tfjs_graph_converter fails when converting model with numpy.float64.
Model in question is https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1