nnstreamer / nnstreamer

:twisted_rightwards_arrows: Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.
https://nnstreamer.ai
GNU Lesser General Public License v2.1
683 stars 171 forks source link

Does tensor_filter framework=pytorch support nvidia GPU? #4481

Open liuhao-97 opened 3 weeks ago

liuhao-97 commented 3 weeks ago

Hi team,

I am running tensor_filter with framework=pytorch on Jetson AGX orin (it contains a GPU), and I generated the pytorch model with pytorch 1.3.1. The command I am running is as following:

gst-launch-1.0 filesrc location=rgb.jpg ! jpegdec ! videoconvert ! videoscale ! video/x-raw, format=RGB, width=100, height=100 ! tensor_converter ! tensor_transform mode=transpose option=1:2:0:3 ! tensor_filter framework=pytorch model=simple_dnn.torchscript.pt input=32:32:3:1 inputtype=float32 inputname=input output=32:32:4:1 outputtype=float32 ! tensor_sink name=tensor_sink

I got the response like this, which shows the gpu is not used. I double check the gpu utilization to confirm the gpu is not used.

I also check previous issue https://github.com/nnstreamer/nnstreamer/issues/3543 and it shows tflite doesn't support NVIDIA GPU. Then what about nnstreamer-pytorch? Does it support NVIDIA GPU? Or I did something wrong?

Thanks! ** Message: 11:27:39.846: gpu = 0, accl = cpu Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:00.000192769 Setting pipeline to NULL ... Freeing pipeline ...

myungjoo commented 3 weeks ago

Yes, as long as the pytorch installed in your machine, linked to nnstreamer, supports NVidia GPU, you can enforce using GPU by adding a property to tensor_filter, accelerator=true:gpu.

Check the nnstreamer.ini file you are using, too:

...

[pytorch]
enable_use_gpu=TRUE

And... tf-lite these days support GPU in general. You can enable GPU delegation with tflite.

liuhao-97 commented 3 weeks ago

Thanks for your response! When I add accelerator=true:gpu for tensor_filter, it gives me this: I think this happens because the different version of pytorch. I check the pytorch version of my Jetson AGX is 1.11.0, and the nnstreamer-pytorch support pytorch 1.10.2 if I am correct. May I ask how to let nnstreamer-pytorch support pytorch 1.11.0?

Thanks!

` ** Message: 16:43:59.641: gpu = 1, accl = gpu

(gst-launch-1.0:2732): CRITICAL : 16:43:59.669: Exception while loading the model: PyTorch is not linked with support for cuda devices Exception raised from getDeviceGuardImpl at /build/pytorch-53jnGq/pytorch-1.10.2/c10/core/impl/DeviceGuardImplInterface.h:318 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xa0 (0xffffa70dddc8 in /lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xf8 (0xffffa7 0c10fc in /lib/libc10.so) frame #2: + 0x9d81c8 (0xffffa26fb1c8 in /lib/libtorch_cpu.so) frame #3: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optional) + 0x16c (0xffffa2b4655c in /lib/libtorch_cpu.so) frame #4: + 0x1748910 (0xffffa346b910 in /lib/libtorch_cpu.so) frame #5: at::_ops::to_device::call(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool, c10::optional) + 0x19c (0xffffa2efc16c in /lib/libtorch_cpu.so) frame #6: torch::jit::Unpickler::readInstruction() + 0x588 (0xffffa524f778 in /lib/libtorch_cpu.so) frame #7: torch::jit::Unpickler::run() + 0x80 (0xffffa5250d60 in /lib/libtorch_cpu.so) frame #8: torch::jit::Unpickler::parse_ivalue() + 0x34 (0xffffa5250efc in /lib/libtorch_cpu.so) frame #9: torch::jit::readArchiveAndTensors(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::cxx11::basic_string<char, std::char_traits< char>, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optional<std::function<c10::StrongTypePtr (c10::Qua lifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type > (c10::StrongTypePtr, c10::IValue)> >, c10::optional, caffe2::serialize::PyTorchStreamReader&, std::shared_ptr) + 0x2f4 (0xffffa521538c in /lib/lib torch_cpu.so) frame #10: + 0x34ead30 (0xffffa520dd30 in /lib/libtorch_cpu.so) frame #11: + 0x34ed788 (0xffffa5210788 in /lib/libtorch_cpu.so) frame #12: torch::jit::load(std::shared_ptr, c10::optional, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits

, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::hash, st d::allocator > >, std::equal_to, std::allocator > >, std::allocator, std::allocator > const, std::__cxx11::basic_string, std::allocator > > > >&) + 0x19c (0xffffa5211adc in /lib/libtorch_cp u.so) frame #13: torch::jit::load(std::__cxx11::basic_string, std::allocator > const&, c10::optional, std::unordered_map, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::hash, std::allocator > >, std::equal_to, std::allocator > >, std::allocator, std::allocator > const, std::__cxx11::basic_string, std::allocator > > > >&) + 0xc0 (0xfff fa52138f0 in /lib/libtorch_cpu.so) frame #14: torch::jit::load(std::__cxx11::basic_string, std::allocator > const&, c10::optional) + 0x50 (0xffffa52139e0 in /lib/libtorc h_cpu.so) frame #15: TorchCore::loadModel() + 0xb0 (0xffffa7124968 in /usr/lib/nnstreamer/filters/libnnstreamer_filter_pytorch.so) frame #16: TorchCore::init(_GstTensorFilterProperties const*) + 0x68 (0xffffa7124ff0 in /usr/lib/nnstreamer/filters/libnnstreamer_filter_pytorch.so) frame #17: + 0x70b0 (0xffffa71250b0 in /usr/lib/nnstreamer/filters/libnnstreamer_filter_pytorch.so) frame #18: gst_tensor_filter_common_open_fw + 0x1f0 (0xffffa7840e50 in /lib/aarch64-linux-gnu/libnnstreamer-single.so) frame #19: gst_tensor_filter_load_tensor_info + 0x15c (0xffffa78411e4 in /lib/aarch64-linux-gnu/libnnstreamer-single.so) frame #20: + 0x1d0f0 (0xffffa78960f0 in /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libnnstreamer.so) frame #21: + 0x4158c (0xffffa7b0258c in /lib/aarch64-linux-gnu/libgstbase-1.0.so.0) frame #22: + 0x46698 (0xffffa7b07698 in /lib/aarch64-linux-gnu/libgstbase-1.0.so.0) frame #23: + 0x1d878 (0xffffa7896878 in /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libnnstreamer.so) frame #24: gst_pad_query + 0x3b8 (0xffffa80ea000 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #25: gst_pad_query_caps + 0xd4 (0xffffa812ad0c in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #26: gst_element_get_compatible_pad + 0xb54 (0xffffa812b9a4 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #27: gst_element_link_pads_full + 0xb44 (0xffffa812c744 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #28: gst_element_link_pads_filtered + 0x310 (0xffffa812d0c8 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #29: + 0xe4694 (0xffffa8148694 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #30: gst_parse_launch_full + 0x100 (0xffffa813ef10 in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #31: gst_parse_launchv_full + 0x1f4 (0xffffa813f18c in /lib/aarch64-linux-gnu/libgstreamer-1.0.so.0) frame #32: + 0x37ec (0xaaaad3bf37ec in gst-launch-1.0) frame #33: __libc_start_main + 0xe8 (0xffffa7d34e10 in /lib/aarch64-linux-gnu/libc.so.6) frame #34: + 0x4014 (0xaaaad3bf4014 in gst-launch-1.0) ** (gst-launch-1.0:2491): CRITICAL **: 17:10:32.371: Failed to load model failed to initialize the object: PyTorch ERROR: Pipeline doesn't want to pause. Setting pipeline to NULL ... Freeing pipeline ... `
myungjoo commented 3 weeks ago

The error message says what's wrong: CRITICAL : 16:43:59.669: Exception while loading the model: PyTorch is not linked with support for cuda devices

Your pytorch is not built for CUDA. You need to install pytorch that is CUDA enabled.

liuhao-97 commented 3 weeks ago

Thanks for your response!

I have installed pytorch 1.11.0 cuda version on my Jetson AGX orin. I have checked the torch.cuda.is_available() and it gives true. But I think nnstreamer-pytorch only support pytorch 1.10.2.

Besides, I also double check the libtorch by building an example-app.cpp file as following, and it works fine.

I check the path /lib after I install nnstreamer-pytorch and there only exist libtorch.so, libtorch_cpu.so, libtorch_global_deps.so, libtorch_python.so, and there is no libtorch_gpu.so.

I link the libtorch_cpu.so of /usr/lib/nnstreamer/filters/libnnstreamer_filter_pytorch.so to /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so (under this path libtorch_gpu.so exsit), and it gives core dumped.

So I think it is because of the difference of the pytorch version (Jetson offers docker with pytorch 1.11.0, while nnstreamer-pytorch requires pytorch 1.10.2). NVIDIA and pytorch doesn't provide pytorch 1.10.2 .whl for arrch64.

I also saw this link of question about re-building nnstreamer-pytorch. https://lists.lfaidata.foundation/g/nnstreamer-technical-discuss/topic/which_version_of_pytorch_is/87447736

May I ask is it possible to build nnstreamer-pytorch with the pytorch 1.11.0 myself on Jetson? Could you pls explain how to do it? Sorry to disturb you so much.

Thanks!

https://discuss.pytorch.org/t/error-pytorch-is-not-linked-with-support-for-cuda-devices/103807,

#include <torch/script.h> // One-stop header.
#include <torch/torch.h>  // More explicit include for full Torch functionality
#include <iostream>
#include <memory>

int main()
{
    try
    {
        // Check if CUDA is available and the number of CUDA devices
        bool isCudaAvailable = torch::cuda::is_available();
        std::size_t devicesCount = torch::cuda::device_count();
        std::cout << "CUDA devices count - " << devicesCount << '\n';
        std::cout << (isCudaAvailable ? "CUDA available" : "CUDA NOT available") << '\n';

        // Check if cuDNN is available
        bool isCudnnAvailable = torch::cuda::cudnn_is_available();
        std::cout << (isCudnnAvailable ? "CUDNN available" : "CUDNN NOT available") << '\n';

        if (isCudaAvailable)
        {
            // Deserialize and move the model to GPU
            torch::jit::script::Module module = torch::jit::load("../simple_dnn_pt110.torchscript.pt");
            module.to(torch::kCUDA);
            module.eval();

            // Create and process a tensor on GPU
            auto input = torch::randn({1, 3, 32, 32}, torch::device(torch::kCUDA));
        }
    }
    catch (const c10::Error &e)
    {
        std::cerr << "torch error - \n"
                  << e.what() << '\n';
        return -1;
    }
}

This is the CMakeList.txt

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

The result is

CUDA devices count - 1
CUDA available
CUDNN available
myungjoo commented 2 weeks ago

Thanks for your response!

I have installed pytorch 1.11.0 cuda version on my Jetson AGX orin. I have checked the torch.cuda.is_available() and it gives true. But I think nnstreamer-pytorch only support pytorch 1.10.2.

Unless pytorch API has lost backward compatibility, 1.11.0 should be also supported.

Besides, I also double check the libtorch by building an example-app.cpp file as following, and it works fine.

I check the path /lib after I install nnstreamer-pytorch and there only exist libtorch.so, libtorch_cpu.so, libtorch_global_deps.so, libtorch_python.so, and there is no libtorch_gpu.so.

I link the libtorch_cpu.so of /usr/lib/nnstreamer/filters/libnnstreamer_filter_pytorch.so to /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so (under this path libtorch_gpu.so exsit), and it gives core dumped.

If this "link" is a filesystem link (ln), not toolchain's link (-l/-L of toolchain options), yes, it won't work.

You may need to compile libnnstreamer_filter_pytorch.so with a Pytorch of yours available and linked by your toolchain. In other words, when you compile libnnstreamer_filter_pytorch.so, gcc ... -lpytorch should be able to link to the Pytorch you intend to use.

If you want to see which .so files are linked to the given libnnstreamer_filter_pytorch.so, use ldd.

So I think it is because of the difference of the pytorch version (Jetson offers docker with pytorch 1.11.0, while nnstreamer-pytorch requires pytorch 1.10.2). NVIDIA and pytorch doesn't provide pytorch 1.10.2 .whl for arrch64.

I also saw this link of question about re-building nnstreamer-pytorch. https://lists.lfaidata.foundation/g/nnstreamer-technical-discuss/topic/which_version_of_pytorch_is/87447736

May I ask is it possible to build nnstreamer-pytorch with the pytorch 1.11.0 myself on Jetson? Could you pls explain how to do it? Sorry to disturb you so much.

  1. Build pytorch in your machine with all options you want.
  2. Install the pytorch and make sure that the resulting libpytorch.so is discoverable by compilers. You may add a link file from the default lib directory. Or, create a pytorch.pc (pkgconfig file) with appropriate configuration and install it to your pkgconfig directory. (Refer to pkgconfig manual for this.)
  3. Build nnstremaer in your machine with pytorch enabled.

Thanks!

https://discuss.pytorch.org/t/error-pytorch-is-not-linked-with-support-for-cuda-devices/103807,

#include <torch/script.h> // One-stop header.
#include <torch/torch.h>  // More explicit include for full Torch functionality
#include <iostream>
#include <memory>

int main()
{
    try
    {
        // Check if CUDA is available and the number of CUDA devices
        bool isCudaAvailable = torch::cuda::is_available();
        std::size_t devicesCount = torch::cuda::device_count();
        std::cout << "CUDA devices count - " << devicesCount << '\n';
        std::cout << (isCudaAvailable ? "CUDA available" : "CUDA NOT available") << '\n';

        // Check if cuDNN is available
        bool isCudnnAvailable = torch::cuda::cudnn_is_available();
        std::cout << (isCudnnAvailable ? "CUDNN available" : "CUDNN NOT available") << '\n';

        if (isCudaAvailable)
        {
            // Deserialize and move the model to GPU
            torch::jit::script::Module module = torch::jit::load("../simple_dnn_pt110.torchscript.pt");
            module.to(torch::kCUDA);
            module.eval();

            // Create and process a tensor on GPU
            auto input = torch::randn({1, 3, 32, 32}, torch::device(torch::kCUDA));
        }
    }
    catch (const c10::Error &e)
    {
        std::cerr << "torch error - \n"
                  << e.what() << '\n';
        return -1;
    }
}

This is the CMakeList.txt

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

The result is

CUDA devices count - 1
CUDA available
CUDNN available