[Web] How to free webgpu gpu mem in onnxruntime web

soyoapp commented 1 month ago

Describe the issue

I use onnxruntime web with following code

/**
 *
 * @param model don't pass session but pass model path and create session in infer inner. In this way, after infer finish, it will auto free gpu mem to prevent mem overflow
 * @param inputTensor
 */
export async function infer2(model: string, inputTensor: Tensor) {
  const session = await newSession(model)
  const feeds: any = {};
  const inputNames = session.inputNames;
  feeds[inputNames[0]] = inputTensor;
  const results = await session.run(feeds);
  const tensor = results[session.outputNames[0]]
  // await session.release() // free gpu mem
  await session.release() // free gpu mem
  return tensor;
}

/**
 * Load the ONNX model and perform inference
 * @param model don't pass session but pass model path and create session in infer inner. In this way, after infer finish, it will auto free gpu mem to prevent mem overflow
 * @param {onnxruntime.Tensor} inputTensor - Input tensor
 * @param {number[]} inputShape - Input tensor shape
 * @returns {Promise<Float32Array>} - Output tensor data
 */
export const infer = async (model: string, input: Ndarray) => {
  let inputTensor = ndarrayToTensor(input)
  const outTensor = await infer2(model, inputTensor);
  let na = new Ndarray(Array.from(outTensor.data as Float32Array) as number[], outTensor.dims as number[])
  inputTensor.dispose()
  outTensor.dispose()
  return na
  // const {data: out, dims: outShape} = results[session.outputNames[0]]
  // return {out: out as Float32Array, outShape: outShape as number[]}
};

and following is my test code

  let input = await imgToNdarray(t);
  let out = await infer(model, input)
  let imgDataUrl = outToImgDataUrl(out)
  testReact(<img src={imgDataUrl}/>)

but after infer, nvidia-smi show the gpu mem is still in use, only refresh browser tab or close browser tab can free gpu mem

To reproduce

Just run above code

Urgency

No response

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

Execution Provider

'webgpu' (WebGPU)

Env

Microsoft Edge 127.0.2651.74 (Official build) (64-bit) Revision dbf5b0aa014c4e70e3d5e2d73248e21264f82957 Chromium version 127.0.6533.73 Operating system Linux JavaScript V8 12.7.18.6 User agent Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0 Command-line /usr/bin/microsoft-edge --disable-web-security --password-store=basic --user-data-dir=/home/roroco/.config/JetBrains/WebStorm2023.2/edge-user-data --remote-debugging-port=39765 --no-default-browser-check --flag-switches-begin --enable-unsafe-webgpu --enable-features=Vulkan --flag-switches-end about:blank

guschmue commented 1 month ago

yes, we keep a cache of gpu buffers for popular sizes. we can maybe free that cache when the last session gets disposed, assuming that an app would not do create/close/create sequences.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime