Currently to perform inference of the models generated the user needs to interact with the model generated writing a small python script accordingly to how the model is saved by library, by loading the resulting checkpoint or model saved after training.
Moreover a lot of optimization can be integrated to speed-up the inference such as:
@PierpaoloSorbellini The inference section is tagged with WIP. Do we have any basic inference code available in chatllama to load actor_rl model and run few queries ?
Description
Currently to perform inference of the models generated the user needs to interact with the model generated writing a small python script accordingly to how the model is saved by library, by loading the resulting checkpoint or model saved after training.
Moreover a lot of optimization can be integrated to speed-up the inference such as:
TODO