xinghaochen / Pose-REN

Code for "Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation", Neurocomputing 2020
167 stars 52 forks source link

What do you mean by those icvl_center.txt and msra_center.txt files? #10

Closed MLsmaller closed 5 years ago

MLsmaller commented 5 years ago

Are they predicted by the model or parameters provided by the camera? Thank you very much.

MLsmaller commented 5 years ago

I have been looking at your project for the past two days, but I encountered some problems. I used the Kinectv2 camera. Now I want to test the depth image captured by this camera with the model you have trained. Which files should I execute and where should I change the code? Thank you for your guidance

l-j-oneil commented 5 years ago

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

xinghaochen commented 5 years ago

@l-j-oneil Thank you very much! Your comments are really helpful.

@MLsmaller I think the comments from @l-j-oneil should address most of your concerns. Here are some additional comments that you may find helpful:

  1. The centers are simply obtained by calculating the centroids of the pixels that fall into a predefined range of distance. You can use https://github.com/guohengkai/region-ensemble-network/blob/master/evaluation/get_centers.py to obtain centers for ICVL, NYU and MSRA datasets.
  2. To run the hand pose estimator using Kinect V2, you also have to revise the intrinsic paramters accordingly. My be (365.456, 365.456, 254.878, 205.395) is suitable for most Kinect V2 cameras but you can also retrieve the parameters from your Kinect V2 using official SDK.
MLsmaller commented 5 years ago

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

Thank you for your reply,if I only want to run this model to test some images instead of real-time project,which functions should I running? Thanks again.

xinghaochen commented 5 years ago

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

MLsmaller commented 5 years ago

Hi,

I am not the author of this code, I have just been experimenting with training a similar model. So please note I may not be 100% correct.

Firstly, I believe the files ICVL_center.txt and msra_center.txt are just holding pre-calculated values for the centre point of each hand in the respective datasets test images. Two functions, "get_center" and "get_center_fast" can be found in the util folder, which will generate such values: https://github.com/xinghaochen/Pose-REN/blob/master/src/utils/util.py

The src/demo folder holds example code to run this on a Intel real-sense camera: https://github.com/xinghaochen/Pose-REN/blob/master/src/demo/realsense_realtime_demo_librealsense2.py

In order to get this up and running on a Kinect V2 you will need to rewrite the function "read_frame_from_device()" calling the respective python API for the Kinect camera. Please note, the input images to the network are 96x96 in size and normalised between [1,-1].

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py

Should help with your efforts.

Thank you for your reply,I will try this tomorrow. I'm not very experienced in this field, I guess I may have some questions to ask you in the future. I hope you can give me your advice. Thanks again By the way,I have your project--awesome hand pose estimation in my collection, which has helped me a lot

MLsmaller commented 5 years ago

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

I know what you mean. I'll try it tomorrow. Thank you

MLsmaller commented 5 years ago

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

I initially thought the/src/testing/run_images.py file was for testing images,but I didn't understand this file very well. By the way, is the/src/testing/predicti.py file used to train the new data set or something else?

xinghaochen commented 5 years ago
MLsmaller commented 5 years ago

@MLsmaller You can start with realsense_realtime_demo_librealsense2.py and read the depth images from local files instead of camera.

Hi,I tried it on your suggestion,this is the file I changed according to the realtime demo file.However, the result I got is not accurate. Do I still lack some operation in the data processing part?

Here is the .py file:

#-*- coding:utf-8 -*-
import logging
logging.basicConfig(level=logging.INFO)
import numpy as np
import cv2
#import pyrealsense2 as rs
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(ROOT_DIR) # config
sys.path.append(os.path.join(ROOT_DIR, 'utils')) # utils
sys.path.append(os.path.join(ROOT_DIR, 'libs')) # libs
from model_pose_ren import ModelPoseREN
import util
from util import get_center_fast as get_center

def init_device():
    # Configure depth streams
    pipeline = rs.pipeline()
    config = rs.config()
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    print 'config'
    # Start streaming
    profile = pipeline.start(config)
    depth_sensor = profile.get_device().first_depth_sensor()
    depth_scale = depth_sensor.get_depth_scale()
    print "Depth Scale is: " , depth_scale
    return pipeline, depth_scale

def stop_device(pipeline):
    pipeline.stop()

def read_frame_from_device(pipeline, depth_scale):
    frames = pipeline.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    #if not depth_frame:
    #    return None
    # Convert images to numpy arrays
    depth_image = np.asarray(depth_frame.get_data(), dtype=np.float32)
    depth = depth_image * depth_scale * 1000
    return depth

def show_results(img, results, cropped_image, dataset):
    img = np.minimum(img, 1500)
    img = (img - img.min()) / (img.max() - img.min())
    img = np.uint8(img*255)
    # draw cropped image
    img[:96, :96] = (cropped_image+1)*255/2
    img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    cv2.rectangle(img, (0, 0), (96, 96), (255, 0, 0), thickness=2)
    img_show = util.draw_pose(dataset, img, results)
    return img_show

def main():
    # intrinsic paramters of Kinect V2
    #fx, fy, ux, uy = 365.456, 365.456, 254.878, 205.395  #kinect v2
    fx, fy, ux, uy = 463.889, 463.889, 320, 240
    # paramters
    dataset = 'icvl'
    if len(sys.argv) == 2:
        dataset = sys.argv[1]

    lower_ = 1
    upper_ = 650     

    # init realsense
    #pipeline, depth_scale = init_device()
    # init hand pose estimation model
    hand_model = ModelPoseREN(dataset,
        lambda img: get_center(img, lower=lower_, upper=upper_),
        param=(fx, fy, ux, uy), use_gpu=True)
    # for msra dataset, use the weights for first split
    if dataset == 'msra':
        hand_model.reset_model(dataset, test_id = 0)
    # realtime hand pose estimation loop
    #depth = read_frame_from_device(pipeline, depth_scale)
    icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0050.png"
    # preprocessing depth
    depth=cv2.imread(icvl_path,2)

    depth=np.asarray(depth,np.float32)
    depth[depth == 0] = depth.max()
    print(depth)
    # training samples are left hands in icvl dataset,
    # right hands in nyu dataset and msra dataset,
    # for this demo you should use your right hand
    if dataset == 'icvl':
        depth = depth[:, ::-1]  # flip
    # get hand pose
    results, cropped_image = hand_model.detect_image(depth)
    img_show = show_results(depth, results, cropped_image, dataset)
    cv2.imwrite('./test.png', img_show)
    #stop_device(pipeline)

if __name__ == '__main__':
    main()

----I read the image from the ICVL dataset(depth image) instead of from the camera, which is the detection effect of this image

image

By the way,I don't know what depth = depth_image depth_scale 1000 means in the function read_frame_from_device(), what does that mean, what does the parameter depth_scale mean, and I'm wondering is that the problem? Sincerely hope to get your reply.

xinghaochen commented 5 years ago

Hi,

  1. depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
  2. There are several problems of your code when you are dealing with ICVL dataset:
    fx, fy, ux, uy = 463.889, 463.889, 320, 240

    These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset. if dataset == 'icvl': depth = depth[:, ::-1] # flip

You don't have to flip the depth images for those from ICVL dataset. In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.

MLsmaller commented 5 years ago

Hi,

  1. depth_scale is a parameter related to Realsense SR300 camera. Since you are not reading depth images from the camera, you don't need to use this parameter.
  2. There are several problems of your code when you are dealing with ICVL dataset:
fx, fy, ux, uy = 463.889, 463.889, 320, 240

These parameters are related to Kinect V2. Since you are using depth images from ICVL dataset, you should use parameters of (240.99, 240.96, 160, 120), which are parameters of the camera used to capture ICVL dataset.

if dataset == 'icvl': depth = depth[:, ::-1] # flip

You don't have to flip the depth images for those from ICVL dataset. In fact, we flip the depth images from Realsense camera when using ICVL pre-trained models just because images from ICVL dataset are flipped.

Thank you ,After your suggestion is adopted, icvl data set can be successfully detected. Why is it not ok for Nyu data set -----This is my code:

def main():
    # intrinsic paramters of Kinect V2
    #fx, fy, ux, uy =  365.456, 365.456, 254.878, 205.395  #kinect v2
    #fx, fy, ux, uy = 240.99, 240.96, 160, 120

    # paramters
    dataset = 'icvl'
    fx, fy, ux, uy = util.get_param(dataset)
    if len(sys.argv) == 2:
        dataset = sys.argv[1]
        print("the model of data is {}".format(sys.argv[1]))

    lower_ = 1
    upper_ = 650     #在0-650mm范围内

    # init realsense
    #pipeline, depth_scale = init_device()
    # init hand pose estimation model
    hand_model = ModelPoseREN(dataset,
        lambda img: get_center(img, lower=lower_, upper=upper_),
        param=(fx, fy, ux, uy), use_gpu=True)
    # for msra dataset, use the weights for first split
    if dataset == 'msra':
        hand_model.reset_model(dataset, test_id = 0)
    # realtime hand pose estimation loop
    #depth = read_frame_from_device(pipeline, depth_scale)
    icvl_path="/home/data/cy/ICVL/test/Depth/test_seq_1/image_0500.png"
    nyu_path="/home/data/cy/NYU/dataset/test/depth_1_0000001.png"
    print(dataset)
    # preprocessing depth
    #depth=cv2.imread(icvl_path,2)

    #depth=np.asarray(depth,np.float32)
    #depth[depth == 0] = depth.max()
    depth=util.load_image(dataset, nyu_path, is_flip=False)
    print(depth,type(depth))
    # training samples are left hands in icvl dataset,
    # right hands in nyu dataset and msra dataset,
    # for this demo you should use your right hand
    #if dataset == 'nyu':
        #depth = depth[:, ::-1]  # flip
    # get hand pose
    results, cropped_image = hand_model.detect_image(depth)
    img_show = show_results(depth, results, cropped_image, dataset)
    cv2.imwrite('./test.png', img_show)
    #stop_device(pipeline)

if __name__ == '__main__':
    main()

This is the warning, image

And the results do not draw the keypoints of hand.

image Sorry to bother you again.

xinghaochen commented 5 years ago

That's because the hand is not properly segmented. Cropping hand region from original depth image is a bit different between different dataset. If you are dealing with the predefined dataset (ICVL, NYU, MSRA15), I suggest using predict.py to get the predicted results.

MLsmaller commented 5 years ago

Now i know what your mean,I just test the NYU dataset. My idea is to test the depth image that get from kinect v2,Could you please tell me whether I can directly read the depth images I have saved from kinect just like I did when testing the icvl dataset,and i have to choose the nyu model?(Because this data set was captured by the kinect camera?) The point I was confused about was that I didn't know what changes needed to be made to the code in the file when testing the depth image obtained from kinect, because I saw that the depth map I captured was very different from the depth image in the dataset. The depth image I saved was very black and my hand was not clear

xinghaochen commented 5 years ago

First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.

If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.

As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.

The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.

MLsmaller commented 5 years ago

First of all, you can use any pre-trained models (ICVL, NYU, MSRA or HANDS17), see some examples here.

If you want to predict hand pose for depth images captured from Kinect, all you have to do is capture a depth image from the camera (perhaps save it to local disk) and preprocess it before feeding it into the Pose-REN model.

As for the preprocess, you can use the code from realsense_realtime_demo_librealsense2.py. Again, you need to change the intrinsic parameters to those of Kinect, and perhaps you have to change the depth threshold since our code simply uses a naive depth thresholding algorithm to segment the hand.

The depth image you saved looks black is most possibly because it's saved in 16-bit format and common image viewer will show black for this image.

Thank you very much for your reply. Now I can detect the depth image I got from kinect, but I don't know how to write python kenict API for real-time detection. The code you provided is for Realsense SR300 camera, could you please provide some guidance?Is there any open source project using kinect camera for real-time detection that can be written in C++ or python? By the way, your awesome-hand-pose-estimation project is of great help to me. Are there any open source projects that are real-time detected by Kinect in it?My idea is similar to the open source project Sphere Meshes for Real - Time Hand Modeling and Tracking, through real-time detection is saved two hands or with two hands of video can generate a relatively robust Hand model, but the project need Hand wrist wear a blue, but also a Hand.Excuse me which open source project can compare helpful to my project? Sincerely hope to get your answer again.

xinghaochen commented 5 years ago

If you want to use C++, here is a demo for using Pose-REN in C++.

If you want to use Python, as @l-j-oneil mentioned in this issue, you can take a look at DeepPrior++. They also provide a heuristic method for detecting hands, which is better than the naive depth thresholding method we used.

I believe the author of DeepPrior++ tested their network using a Kinect and the OpenNI driver: https://github.com/moberweger/deep-prior-pp/blob/master/src/util/cameradevice.py