What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
feature
What is the current behavior? (You can also link to an open issue here)
when we warm-up a model, we load the model into CPU first then CUDA. This might cause a memory issue because when loading a model into CUDA, the model is loaded into memory first and then moved to CUDA. In this case, there will be a time that the model is loaded into the memory twice, which leads to a high peak memory usage.
What is the new behavior (if this is a feature change)?
we load the model into cuda first then cpu later. This can reduce the peak memory usage.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
no
Have unit tests been run against this PR? (Has there also been any additional testing?)
no
Related Python client changes (link commit/PR here)
no
Related documentation changes (link commit/PR here)
no
Other information:
no
Please check if the PR fulfills these requirements
[ ] The commit message follows our guidelines
[ ] Tests for the changes have been added (for bug fixes/features)
[ ] Docs have been added / updated (for bug fixes / features)
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...) feature
What is the current behavior? (You can also link to an open issue here) when we warm-up a model, we load the model into CPU first then CUDA. This might cause a memory issue because when loading a model into CUDA, the model is loaded into memory first and then moved to CUDA. In this case, there will be a time that the model is loaded into the memory twice, which leads to a high peak memory usage.
What is the new behavior (if this is a feature change)? we load the model into cuda first then cpu later. This can reduce the peak memory usage.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?) no
Have unit tests been run against this PR? (Has there also been any additional testing?) no
Related Python client changes (link commit/PR here) no
Related documentation changes (link commit/PR here) no
Other information: no
Please check if the PR fulfills these requirements
[ ] The commit message follows our guidelines
[ ] Tests for the changes have been added (for bug fixes/features)
[ ] Docs have been added / updated (for bug fixes / features)