How to make FP16 quantization on gpt/xl?

Archelunch commented 4 years ago

How could I fix this error? ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6234365906

Pierrci commented 4 years ago

Hi @Archelunch, what is the script you use? At which step are you getting this error?

Archelunch commented 4 years ago

output_graph = "frozen_graph.pb"`
output_graph_def = tf.graph_util.convert_variables_to_constants(
                        sess,
                        tf.get_default_graph().as_graph_def(),
                        ["sample_sequence_2/while/Exit_3"]
                    ) 

with tf.gfile.GFile(output_graph, "wb") as f:
    f.write(output_graph_def.SerializeToString())


ValueError                                Traceback (most recent call last)
<ipython-input-69-6930509752c3> in <module>
      8     # serialize and dump the output graph to the filesystem
      9 with tf.gfile.GFile(output_graph, "wb") as f:
---> 10     f.write(output_graph_def.SerializeToString())

ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6233583551```

Pierrci commented 4 years ago

Are you working with a big model? It seems you're reaching the limit size (2GB) supported by protobuf which is used internally by TensorFlow to serialize the graph. It doesn't have any relation with quantization, it's a more general TF limitation it seems.

Mockapapella commented 4 years ago

Is there any way to loop through the .pb file in a similar way to this stackoverflow question/answer so as to cut down on the amount of information that is loaded in memory?

huggingface / tflite-android-transformers

How to make FP16 quantization on gpt/xl? #4