huggingface / tflite-android-transformers

DistilBERT / GPT-2 for on-device inference thanks to TensorFlow Lite with Android demo apps
Apache License 2.0
389 stars 79 forks source link

How to make FP16 quantization on gpt/xl? #4

Open Archelunch opened 4 years ago

Archelunch commented 4 years ago

How could I fix this error? ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6234365906

Pierrci commented 4 years ago

Hi @Archelunch, what is the script you use? At which step are you getting this error?

Archelunch commented 4 years ago
output_graph = "frozen_graph.pb"`
output_graph_def = tf.graph_util.convert_variables_to_constants(
                        sess,
                        tf.get_default_graph().as_graph_def(),
                        ["sample_sequence_2/while/Exit_3"]
                    ) 

with tf.gfile.GFile(output_graph, "wb") as f:
    f.write(output_graph_def.SerializeToString())

ValueError                                Traceback (most recent call last)
<ipython-input-69-6930509752c3> in <module>
      8     # serialize and dump the output graph to the filesystem
      9 with tf.gfile.GFile(output_graph, "wb") as f:
---> 10     f.write(output_graph_def.SerializeToString())

ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6233583551```
Pierrci commented 4 years ago

Are you working with a big model? It seems you're reaching the limit size (2GB) supported by protobuf which is used internally by TensorFlow to serialize the graph. It doesn't have any relation with quantization, it's a more general TF limitation it seems.

Mockapapella commented 4 years ago

Is there any way to loop through the .pb file in a similar way to this stackoverflow question/answer so as to cut down on the amount of information that is loaded in memory?