Closed analog-cbarber closed 5 years ago
Another issue is that mxnet does not yet fully support int64:
>>> import mxnet as mx
>>> mx.ndarray.zeros((3,4), dtype=np.int64)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/cbarber/ws/bmxnet/python/mxnet/ndarray/utils.py", line 67, in zeros
return _zeros_ndarray(shape, ctx, dtype, **kwargs)
File "/Users/cbarber/ws/bmxnet/python/mxnet/ndarray/ndarray.py", line 3387, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
File "<string>", line 34, in _zeros
File "/Users/cbarber/ws/bmxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/Users/cbarber/ws/bmxnet/python/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: Invalid Input: 'int64', valid values are: {'float16', 'float32', 'float64', 'int32', 'uint8'}, in operator _zeros(name="", dtype="int64", ctx="cpu(0)", shape="(3, 4)")
zeros
is called during setup of deferred initialization for gluon Parameters, so this would hamper the use of this feature.
The zeros/int64 problem is now MXNet issue #9536. I am afraid that until this is fixed it probably will not be feasible to support binarized weight parameters in the gluon interface as long as they use 'int64'.
Currently,
QFullyConnected
andQConvolution
when using the binary weight format (binarized_weights_only=True
) store weights packed into machine words. There are two issues with this:It is difficult to share the same weights between architectures with different word size because the operator demands that the size of the weights is in units of machine word. It seems like there is no reason that a model that works on a 64-bit machine should not work without modification on a 32-bit machine.
The format inherently assumes a particular byte ordering, so weights will not load correctly when going between a bigendian and little endian machine. This may only matter when loading weights from disk.
It seems bad to have a weight format that is not portable across machine architectures.
Alternatives would be: