Open SinanAkkoyun opened 1 year ago
Hi @SinanAkkoyun, thanks for raising the issue! We are not familiar with the Transformer engine, and does not have access to H100 GPUs at the moment. We will let you know after we investigate more on that.
Hi @WoosukKwon, any update on this issue? FLOPS is theoretically 2x of FP16. H100s can quite easily be rented from RunPod, Azure, AWS, or the like.
Hi @WoosukKwon , I also want to know that when vLLM will support FP8 in H100(H800)?FP8 is 2x faster than FP16.
any update on FP8 @WoosukKwon ?
Second this, would be
Hi! Is adding FP8 transformer engine (H100) speedup to inference planned? If not, could you please give me an outline of what needs to be done in order for me to work on that?
Thank you!