intel / xFasterTransformer

Apache License 2.0
374 stars 65 forks source link

[Fix] Reduce convert memory usage. #297

Closed marvin-Yu closed 6 months ago

marvin-Yu commented 6 months ago

The current Qwen-72B model conversion process consumes approximately 282GB of memory, which far exceeds the configuration of the machines currently used by the client. This PR modify the conversion method to reduce memory usage to around 20~30GB.