Support Huggingface Transformer?

Few days ago, they published the model to huggingface, which means there is no need to submit a form now. https://huggingface.co/decapoda-research/llama-65b-hf And the huggingface transformer is already implemented https://github.com/zphang/transformers/tree/llama_push, maybe it will be in the package formally in the near future. I just wonder whether it is time consuming to support/change to this pipeline...

Another thing is about quantization, I found this repo and execute the benchmarks there, quite interesting. I am not an expert... Just as a reference https://github.com/qwopqwop200/GPTQ-for-LLaMa.

juncongmoo / chatllama

Support Huggingface Transformer? #3