Closed lucasjinreal closed 1 year ago
There is a way, it is open-source! :) I decided to left this way of outputs, because I'm inferencing 65B model sometimes and wish to see the generation process, to be able to terminate it, as it goes very slowly.
@randaller how many 30B cpu mem needed to inference? I adopted this to inference on GPU with 16GB mem, seems fast, but the promots not really well
how many 30B cpu mem needed to inference?
@jinfagang About 70 gigabytes of RAM needed for 30B model. It should work with any amount of RAM, just very slowly, systems will use swap file hardly.
curretnly outputs all inference steps and results, is there a way print only answer?