scanner-research / scanner

Efficient video analysis at scale
https://scanner-research.github.io/
Apache License 2.0
615 stars 108 forks source link

Making sure to use GPU in custom TF ops #228

Closed orm011 closed 5 years ago

orm011 commented 5 years ago

Hi there, I'm excited about using Scanner. Thanks for all the work.

I tried using both the scanner/examples/apps and the scannertools object detect pipeline (both using a tensorflow op), but was not able to see GPU usage on nvidia-smi, meanwhile my CPUs were fully used. I ran it on a longer video to see that it was not just me imagining things.

Looking into it further, I checked the KernelConfig within the custom Op and it only has a single CPU devicesnin its config.devices field. I looked around a bunch, but found no documentation on how to set up the config used for these custom ops. On the other hand, the db.has_gpu() method does return True. Any pointers on how to ensure use of GPU?

fpoms commented 5 years ago

Hey @orm011, thanks!

I've updated the example in scanner/examples/apps to default to using GPUs for the operation if available. Thanks for catching that.

If you pull the newest code and rebuild, it should be fixed.

orm011 commented 5 years ago

Thank you. I pulled and (tried to) rebuild. Unexpectedly, I run into the following:

[ 19%] Building CXX object scanner/engine/CMakeFiles/engine.dir/evaluate_worker.cpp.o
/opt/scanner/scanner/engine/evaluate_worker.cpp: In member function ‘void scanner::internal::PreEvaluateWorker::feed(scanner::internal::EvalWorkEntry&, bool)’:
/opt/scanner/scanner/engine/evaluate_worker.cpp:181:13: error: ‘Result’ is not a member of ‘hwang’
             hwang::Result result = inplace_decoders_[media_col_idx]->initialize(

Not sure why this would be, since the name is defined in the include file (and the file is included)

root@claustrophobia:/opt/scanner# grep hwang/common.h /opt/scanner/scanner/engine/evaluate_worker.cpp
#include "hwang/common.h"
root@claustrophobia:/opt/scanner# grep Result thirdparty/install/include/hwang/common.h  
struct Result {
  Result() : ok(true) {}
  Result(bool _ok, const std::string &_message) : ok(_ok), message(_message) {}
    Result res__ = expr__;                                                     \

The environment is the cuda-9.1 docker image

root@claustrophobia:/opt/scanner# git status
HEAD detached at b1a5dd9

This may be a separate issue. I'm just as happy if you have advice on letting me make use of your changes to the scannerpy part otherwise.

fpoms commented 5 years ago

Which image version are you using? gpu-9.1-cudnn7-v0.2.22?

I would suggest grabbing gpu-9.1-cudnn7-latest. That should contain the fix I pushed without having to rebuild scanner.

orm011 commented 5 years ago

Ah, that was handy. I can confirm the custom ObjDetect in the examples uses available GPUs (after tweaking # of pipelines) now.