cannot force GPU utilization

sph001 commented 2 years ago

I have the below script which I am using to process 1080p video frames with a pretrained resnet50 model. while I expected this to process slowly, but I am only getting about 0.4 fps, which seems much slower than I expected. I have also noticed that the below code will not utilize my GPU and I cannot find a mechanism to attempt to force it to do so.

std::vector<uchar> flatten(const cv::Mat* img){
    cv::Mat con;
    cv::cvtColor(*img, con, cv::COLOR_RGB2BGR);
    std::vector<uchar> flattened(img->rows * img->cols * img->channels());
    cv::Mat flat = con.reshape(1, con.rows * con.cols * con.channels());
    auto data = flat.ptr<uchar>();
    flattened.assign(data, data + flat.total() * flat.channels());
    return flattened;
}

int main(int argc, char** argv){
    std::vector<uint8_t> config{0x32,0xb,0x9,0x9a,0x99,0x99,0x99,0x99,0x99,0xd9,0x3f,0x20,0x1};
    TFE_ContextOptions* options = TFE_NewContextOptions();
    TFE_ContextOptionsSetConfig(options, config.data(), config.size(), cppflow::context::get_status());
    cppflow::get_global_context() = cppflow::context(options);
    cppflow::model model("/tmp/model/snapshot-100000.pb", cppflow::model::FROZEN_GRAPH);
    cppflow::tensor input;
    cv::VideoCapture capture("/tmp/model/camera_1.avi");
    while (count < 20){
        cv::Mat frame;
        capture >> frame;
        if (frame.empty())
            break;
        auto flat = flatten(&frame);
        input = cppflow::tensor(flat, {1080, 1920, 3});
        //std::cout << input.device(); returns CPU:0
        input = cppflow::cast(input, TF_UINT8, TF_FLOAT);
        input = cppflow::expand_dims(input, 0);
        auto output = model({{"Placeholder:0", input}},{"concat_1:0"});
    }
    return 0;
}

the only output I get from tensorflow is "2022-03-03 13:53:14.560098: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags."

have I configured something improperly to cause this or is this a limitation?

serizba commented 2 years ago

Hi @sph001

Did you download the GPU version from the TensorFlow C API site?

sph001 commented 2 years ago

I did, but is there a way to validate that I have the right version (unfortunately system might have several versions from other users).

serizba commented 2 years ago

When running the GPU version you should get some log about the found devices, or warning about any problem (if any problem exists). As you are not getting this kind of messages I would assume that the linker is using another libtensorflow.

How are you compiling your program?

sph001 commented 2 years ago

I'm not getting any output from the log about found devices.

Using CMake (its a ros package so it has a few other things in it.

cmake_minimum_required(VERSION 3.0.2)
project(osrl_capture)
add_compile_options(-std=c++17)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O0")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0")

find_package(catkin REQUIRED COMPONENTS
  roscpp
  std_msgs
  cv_bridge
  image_transport
)

find_package(OpenCV 4.2.0 REQUIRED)
find_library(TENSORFLOW_LIB tensorflow)
catkin_package(
  INCLUDE_DIRS include
  LIBRARIES osrl_localization
        decklinkSDK
  CATKIN_DEPENDS roscpp
        std_msgs
        cv_bridge
        image_transport
#  DEPENDS system_lib
)

include_directories(include)
include_directories(include/DeckLinkSDK)
include_directories(include/cppflow)
include_directories(
  ${catkin_INCLUDE_DIRS}
  ${OpenCV_INCLUDE_DIRS}
)
add_executable(osrl_capture src/run.cpp
        src/DeckLinkDevice.cpp
        include/decklinkSDK/Linux/platform.cpp
        include/decklinkSDK/DeckLinkAPIDispatch.cpp)
target_link_libraries(osrl_capture ${catkin_LIBRARIES} ${OpenCV_INCLUDE_DIRS} ${TENSORFLOW_LIB})

serizba commented 2 years ago

I am not an expert on CMake, but maybe you can try with

find_library(TENSORFLOW_LIB tensorflow HINT /path/to/mydir/lib)

And specify the location of your desired libtensorflow.

sph001 commented 2 years ago

I have dumped out the linked libaray and it points to:

/usr/local/lib/libtensorflow.so

which is indeed where I installed it (and verified that there is only the gpu capable version there)

serizba commented 2 years ago

I see.

So, the only log you getting from TensorFlow is about oneAPI Deep Neural Network Library (oneDNN)? No other message from TensorFlow?

Does nvidia-smi work right and shows the available devices?

You can try to specify the device to use with:

CUDA_VISIBLE_DEVICES=0 ./my_executable

sph001 commented 2 years ago

only get the oneDNN message

nvidia-smi gives the normal feedback (just for reference: NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6)

I have already exported CUDA_VISIBLE_DEVICES=0, added that anyway to the run and didn't get any difference in my run.

serizba commented 2 years ago

I am running out of ideas. I would carefully re-check that you are linking to the correct libtensorflow. Perhaps running the examples and manually setting the library paths so there is no way of opening the wrong library.

Sorry. Anyway, this looks like an issue with libtensorflow and not with cppflow.

sph001 commented 2 years ago

no worries. thank you very much for your help.

serizba commented 2 years ago

If you find the problem write it here, in case someone encounters the same issue :)

sph001 commented 2 years ago

sorry to have wasted your time. After going through TF_DeviceListName and noticing the GPU wasn't showing up, I completely removed the libtensorflow and replaced it, and now the GPU is being detected.

guessing is someone else attempted to setup tensorflow after I had installed mine (and since there is no way to distinguish between the two) I just assumed it was the way I left it.

sph001 commented 2 years ago

closed

serizba / cppflow

cannot force GPU utilization #179