intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill
Apache License 2.0
161 stars 46 forks source link

[tool] Solution for collecting memory consumption by WebML API #522

Open BruceDai opened 5 years ago

BruceDai commented 5 years ago

This issue is filed for discussion about solution for collecting memory consumption by WebML API. QA firstly collected some memory consumption with Image Classification example likes:

consumption = GPU Memory Value2 ( after launched Image Classification example by specified model ) 
              -  GPU Memory Value1 ( after launched empty Tab ) 

This diff consumption memory may involve the memory for HTML rendering, so QA's method wasn't correct.

@fujunwei suggested develop a automation toolkit to only run CTS test cases 100 (or more) times one by one op to check memory consumption (leak)

###junwei recommend 
consumption of op = GPU Memory Value1 ( memory after running CTS test cases 100 times ) 
                   - GPU Memory Value2 ( initial memory before running CTS test cases ) 

@fujunwei Please help review my description of your solution. @huningxin @ibelem Please also take a review, thanks

fujunwei commented 5 years ago

100 (or more) times

Maybe 100 times are too long to run, 20 times are enough?

consumption of op = GPU Memory Value1 ( memory after running CTS test cases 100 times )

  • GPU Memory Value2 ( initial memory before running CTS test cases )
consumption of op = GPU Memory Value1 ( memory after running every time ) 
                   - GPU Memory Value2 ( initial memory after running first time )

It has memory leak if the value increase every time.

fujunwei commented 5 years ago

Maybe LeakSanitizer can help developer to debug where memory leak.

fujunwei commented 5 years ago

CPU memory (shared), GPU memory (MPS? ) -> Mapping to what kind of hardware

The Memory Footprint in Task Manager reports Private Memory Footprint as described in consistent memory metrics.

The definition of Private Memory Footprint is:

So it doesn't include third-party library that is file-backed memory, but shared memory footprint will be calculated accurately not over count, such as the GPU memory in Browser processor count only in Browser processor that doesn't count in GPU processor.

Do we lose memory consumption for native backend calls? -> System memory tools

It's too hard to estimate System memory as mentioned here:

The platonic ideal for this measurement includes memory used by both the kernel, and other system services on behalf of the process in question. Realistically, we have a hard time estimating the memory footprint contained in non-Chrome processes.

fujunwei commented 5 years ago

So my proposal is tested memory = memory footprint + GPU memory in Browser, GPU and Render Processor.

BruceDai commented 5 years ago

Currently we refer to Estimating the private memory footprint of Chrome Linux, and use provided memory compute script to collect data on Linux platform. For Windows and macOS platforms, we also try this script

BruceDai commented 5 years ago

Here're some methods we tried to collect data.

  1. Method1: Chrome Task Manager, but almost all models get the same GPU memory for MPS.
  2. Method2: Activity Monitor, only get the CPU memory usage.
  3. Method3: vmmap -v -interleaved , only get the CPU memory usage.
  4. Method4: refer to Estimating the private memory footprint of Chrome on macOS, with script private_memory_footprint to get private memory footprint in Fast / Slow modes
  1. Method1: Chrome Task Manager, same as macOS.
  2. Method2: System Monitor, only get the CPU memory usage.
  3. Method3: top or ‘cat /proc/meminfo’, can’t get GPU memory usage.
  4. Method4: memory_calc_linux.py , can’t get GPU memory usage.
  5. Method5: intel_gpu_top, can’t get GPU memory usage.
  6. Method6: refer to Estimating the private memory footprint of Chrome Linux, and use provided memory compute script
  1. Method1: Chrome Task Manager, same as macOS.
  2. Method2: Resource Monitor, only get the CPU memory usage.
  3. Method3: ‘tasklist|findstr chrome’, the result same as ‘Working Set(KB)’ in Resource Monitor.