scanner-research / scanner

Efficient video analysis at scale
https://scanner-research.github.io/
Apache License 2.0
615 stars 108 forks source link

Openpose with GPU #221

Closed PushyamiKaveti closed 5 years ago

PushyamiKaveti commented 5 years ago

Hi, I am trying to use openpose with scanner.Though I could run openpose and extract poses it seems like the GPUs are not being used to the full potential. On the other hand if I set the devicetype to CPU its reasonable faster. However, the poses are not being drawn i.e drawpose() function is not called. I gave some print statements inside drawpose() to check this. Is there a way to specify the number of GPUs to use? am I doing anything wrong here?

Thanks, Pushyami

fpoms commented 5 years ago

Hi,

Can you be a bit more specific by "the gpus are not being used to the full potential"? For example:

What do you mean by 'the poses are not being drawn'? Does the output video not show poses drawn on the video? It's possible there are no poses being detected and thus there is nothing to draw.

By default, Scanner will use all the GPUs on your machine. You can specify the number of GPUs to use by setting the pipeline_instances variable on line 42 in the pose_detection app.

Alex

PushyamiKaveti commented 5 years ago

Hi ,

GPU details and utilization : I used nvidia-smi command to see how much memory of the GPU is being used. I have two TITAN Xp GPUs each with 12 GiB memory. Only 3GB in each of the GPUs is being utilized when I run openpose. Given, the CPU version of openpose runs much faster I was wondering that GPU version has some issues.

Coming to pose detection on CPU , I have a custom built puget computer with 56 cpus. Pose detection on same video runs much faster with CPU version. However, I don't see poses being drawn on the images and the detected poses stream is empty.

Below is the code for computational graph and posedraw() function is same as given in the example.

frame = db.sources.FrameColumn()

crop_frame_fn = db.ops.crop_fn(frame=rng, x_start = 640 ,x_end = 1280, y_start=0, y_end=360)

poses_out = db.ops.OpenPose(frame=crop_frame_fn, device=device = DeviceType.CPU, args=pose_args)

drawn_frame = db.ops.PoseDraw(frame=crop_frame_fn, frame_poses=poses_out)

sampled_frames = sampler(drawn_frame)
output = db.sinks.Column(columns={'frame': sampled_frames , "pose": poses_out })`

Hope the info helps.

Thanks, Pushyami

willcrichton commented 5 years ago

Hi, it's possible that the CPU implementation of OpenPose isn't working properly. We haven't tested it heavily, since they only recently adopted CPU support. I suspect it seems fast since it's not actually doing anything.

With respect to GPU utilization: I wouldn't use GPU memory as the metric for utilization. There are plenty of networks like OpenPose that only need 3-4 GB of memory. You should look at the % compute utilization (also shown in nvidia-smi), and get a sense of what the average utilization is. The closer to 100, the better.

willcrichton commented 5 years ago

@PushyamiKaveti I'm going to close this for now, but feel free to reopen it if you think the perf is still an issue.