Weird issue with OOM on exported save_pretrained models

pablogranolabar commented 3 years ago

Having a weird issue with DialoGPT Large model deployment. From PyTorch 1.8.0 and Transformers 4.3.3 using model.save_pretrained and tokenizer.save_pretrained, the exported pytorch_model.bin is almost twice the size of the model card repo and results in OOM on a reasonably equipped machine that when using the standard transformers download process it works fine (I am building a CI pipeline to containerize the model hence the pre-populated model requirement):

Model card:

pytorch_model.bin 1.6GB

model.save_pretrained and tokenizer.save_pretrained:

-rw-r--r-- 1 jrandel jrandel  800 Mar  6 16:51 config.json
-rw-r--r-- 1 jrandel jrandel 446K Mar  6 16:51 merges.txt
-rw-r--r-- 1 jrandel jrandel 3.0G Mar  6 16:51 pytorch_model.bin
-rw-r--r-- 1 jrandel jrandel  357 Mar  6 16:51 special_tokens_map.json
-rw-r--r-- 1 jrandel jrandel  580 Mar  6 16:51 tokenizer_config.json
-rw-r--r-- 1 jrandel jrandel 780K Mar  6 16:51 vocab.json

When I download the model card files directly however, I’m getting the following errors:

curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl -L https://huggingface.co/microsoft/DialoGPT-large/resolve/main/pytorch_model.bin -o ./model/pytorch_model.bin
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/tokenizer_config.json -o ./model/tokenizer_config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/config.json -o ./model/config.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/merges.txt -o ./model/merges.txt
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/special_tokens_map.json -o ./model/special_tokens_map.json
curl https://huggingface.co/microsoft/DialoGPT-large/resolve/main/vocab.json -o ./model/vocab.json
<snip>
    tokenizer = AutoTokenizer.from_pretrained("model/")
  File "/var/lang/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 395, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1801, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/var/lang/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1876, in _from_pretrained
    special_tokens_map = json.load(special_tokens_map_handle)
  File "/var/lang/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/var/lang/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/var/lang/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/var/lang/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/runtime/bootstrap.py", line 481, in <module>
    main()
  File "/var/runtime/bootstrap.py", line 458, in main
    lambda_runtime_client.post_init_error(to_json(error_result))
  File "/var/runtime/lambda_runtime_client.py", line 42, in post_init_error
    response = runtime_connection.getresponse()
  File "/var/lang/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/var/lang/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/var/lang/lib/python3.8/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
time="2021-03-08T09:01:39.33" level=warning msg="First fatal error stored in appctx: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Process 14(bootstrap) exited: Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=error msg="Init failed" InvokeID= error="Runtime exited with error: exit status 1"
time="2021-03-08T09:01:39.33" level=warning msg="Failed to send default error response: ErrInvalidInvokeID"
time="2021-03-08T09:01:39.33" level=error msg="INIT DONE failed: Runtime.ExitError"
time="2021-03-08T09:01:39.33" level=warning msg="Reset initiated: ReserveFail"

So what would be causing the large file variance between save_pretrained models and the model card repo? And any ideas why the directly downloaded model card files aren’t working in this example?

Thanks in advance

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

pablogranolabar commented 3 years ago

Should be addressed.

LysandreJik commented 3 years ago

Taking a look at the pytorch_model.bin saved on the microsoft/DialoGPT-small repository, one can see it's made up of float16 weights. When loading the model in the GPT2Model and saving it, the weights are saved in float32, resulting in the large increase.

If you want to keep the model in half precision, add the following line after initializing your model:

model.half()

huggingface / transformers

Weird issue with OOM on exported save_pretrained models #11222