openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.54k stars 2.12k forks source link

[Good First Issue] [ARM]: Implement CPU plugin just-in-time emitter for LogicalNot operation #24445

Open eshoguli opened 2 months ago

eshoguli commented 2 months ago

Context

JIT Emitters are part of code generation feature (a.k.a. tensor compiler) that automatically produces highly-efficient optimized fused subgraph binary code. Each emitter implements specific operation from low level OpenVINO dialect.

Prerequisites

Recommended to use ARM CPU based platform for development (e.g. Mac, Raspberry Pi etc). Emulators (e.g. QEMU) is still an option, but not that convenient, especially for final performance evaluation.

What needs to be done?

Before emitter implementation, please, modify tests to be sure that deveoped functionality is covered by test:

Tests

Tests are disabled in default build, so ensure to add -DENABLE_TESTS=ON into cmake command.

GoogleTest is used for testing. CPU functional test target is ov_cpu_func_tests. You can use two GoogleTest filters for element-wise and activation operations:

Example Pull Requests

Resources

Contact points

@eshoguli, @dmitry-gorokhov

Ticket

CVS-137699

jvr0123 commented 2 months ago

.take

github-actions[bot] commented 2 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

jvr0123 commented 2 months ago

I'm currently having some issues with compiling Openvino on my Raspberry Pi 4. I have followed the build guide but have also been unable to get the release version to compile without the terminal freezing up after reaching about 30% with a bunch of c++: fatal error: Killed signal terminated program cc1plus messages. Here is the command I'm currently trying:

cmake -DCMAKE_BUILD_TYPE=Debug -DTHREADING=SEQ -DENABLE_OV_ONNX_FRONTEND=OFF -DENABLE_OV_PADDLE_FRONTEND=OFF -DENABLE_OV_TF_FRONTEND=OFF -DENABLE_OV_TF_LITE_FRONTEND=OFF -DENABLE_OV_PYTORCH_FRONTEND=OFF -DENABLE_OV_IR_FRONTEND=OFF -DENABLE_TESTS=ON -DENABLE_CLANG_FORMAT=OFF -DENABLE_FASTER_BUILD=ON -DENABLE_SYSTEM_FLATBUFFERS=OFF -DARM_COMPUTE_SCONS_JOBS=$(nproc --all) .. && cmake --build . --parallel

I have llvm and clang installed, as well as the other dependencies in install_build_dependencies.sh. Is this a common issue or am I doing something wrong?

eshoguli commented 2 months ago

CPU overheating? Possible solutions:

  1. Build with cmake --build . --parallel 2 to prevent CPU overheating.
  2. Use qemu: https://www.qemu.org/docs/master/system/target-arm.html. Note, please, you need aarch64 (not 32).
  3. Cross compilation: https://forums.raspberrypi.com/viewtopic.php?t=343710
jvr0123 commented 2 months ago

CPU overheating? Possible solutions:

1. Build with `cmake --build . --parallel 2` to prevent CPU overheating.

2. Use `qemu`: https://www.qemu.org/docs/master/system/target-arm.html. Note, please, you need aarch64 (not 32).

3. Cross compilation: https://forums.raspberrypi.com/viewtopic.php?t=343710

I actually just figured it out, the Raspberry Pi I'm using only has 4GB of ram, and it looks like adding 8GB of swap ram fixed the issue as the build process very quickly ate all of the system's ram. CPU temp isn't a huge issue, it stays around the ~52 C mark for the most of the build. I don't know how much ram the RPi docker image gets, but do you think it might be worth updating the RPi build instructions with minimum RAM requirements if this is a replicable issue?

jvr0123 commented 2 months ago

Should the supported precision be element::f32 or element::boolean? The documentation for the operation specifies that it should only come as a boolean but the issue has fp32

eshoguli commented 2 months ago

Should the supported precision be element::f32 or element::boolean? The documentation for the operation specifies that it should only come as a boolean but the issue has fp32

You should support ov::element::f32. It means the emitter accept float values and returns boolean. To speedup computing boolean return value has the same size as float (4 bytes) but used only the least significant bit.