AttributeError: torch._inductor.config.fx_graph_cache does not exist

Quantize the model to int8 and it gave this error:

ubuntu@ip-172-31-19-240:~/gpt-fast$ python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 Loading model ... /opt/conda/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Quantizing model weights for int8 weight-only symmetric per-channel quantization Writing quantized weights to checkpoints/openlm-research/open_llama_7b/model_int8.pth Quantization complete took 24.35 seconds

ubuntu@ip-172-31-19-240:~/gpt-fast$ python generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth Traceback (most recent call last): File "/home/ubuntu/gpt-fast/generate.py", line 18, in torch._inductor.config.fx_graph_cache = True # Experimental feature to reduce compilation times, will be on by default in future File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/config_utils.py", line 72, in setattr raise AttributeError(f"{self.name}.{name} does not exist") AttributeError: torch._inductor.config.fx_graph_cache does not exist

System:

Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-1049-aws x86_64v)

Please note that Amazon EC2 P2 Instance is not supported on current DLAMI.
Supported EC2 instances: P5, P4d, P4de, P3, P3dn, G5, G4dn, G3.
To activate pre-built pytorch environment, run: 'source activate pytorch'
To activate base conda environment upon login, run: 'conda config --set auto_activate_base true'
NVIDIA driver version: 535.104.12
CUDA version: 12.1

pytorch-labs / gpt-fast

AttributeError: torch._inductor.config.fx_graph_cache does not exist #51