When I run this, I can see it loads the model into ram; it seems only to be using one thread. The output is: a wall of various 'decoder.layers.xx.bias' and "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference."
Ok, I was able to get it to work properly with the 6.7b model I don't think I need the: torch.empty_cache()
Also, it does seem to be using multi-threading.
When I run this, I can see it loads the model into ram; it seems only to be using one thread. The output is: a wall of various 'decoder.layers.xx.bias' and "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference."