ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

Intel RealSense: LoadWebcam() instead of LoadStreams() #692

Closed yonglizhong closed 4 years ago

yonglizhong commented 4 years ago

Hi, due to me using a realsense camera. I had to use pyrealsense2 package to read the stream RGB data instead of using cv2.Videocapture(). My question is how to implement the class LoadWebcam() in detect.py instead of class LoadStreams(). Thanks

glenn-jocher commented 4 years ago

@yonglizhong you can try swapping them in detect.py. See the readme for inference options.

yonglizhong commented 4 years ago

Hi, Thanks for your swift reply. I have attempted it with minor success. Still am trying to figure out this problem, when the inference window shows up, Wonder if you have experienced this result before? Screenshot from 2019-12-10 08-45-11

yonglizhong commented 4 years ago

It works fine with LoadStreams(). Will of course further update you if I have managed to solve this issue in LoadWebcam().

glenn-jocher commented 4 years ago

Great! I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

GilbertTjahjono commented 4 years ago

Hi, actually right now I'm having similar problems with Intel RealSense D435. I'm new to programming and my background is engineering...

I want to stream both of the RGB and Depth frames using pyrealsense2 package. I tried to modify the utils.datasets by setting up the pipeline for RealSense and I also replaced cv2.VideoCapture with pipe.start(cfg). Below is my latest code to stream frames from realsense:

""" def LoadRealSense(): # for inference

img_size = 416
half = False  # half precision fp16 images

# Setup
pipe = rs.pipeline()
cfg = rs.config()
cfg.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
cfg.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

# Start streaming
profile = pipe.start(cfg)

# Read frame
while True:
    frames = pipe.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    color_frame = frames.get_color_frame()
    if not depth_frame or not color_frame:
        continue
    img0 = np.asanyarray(color_frame.get_data())

    # Create alignment primitive with color as its target stream:
    align = rs.align(rs.stream.color)
    frameset = align.process(frames)

    # Update color and depth frames:
    aligned_depth_frame = frameset.get_depth_frame()
    colorizer = rs.colorizer()
    colorized_depth = np.asanyarray(colorizer.colorize(aligned_depth_frame).get_data())

    break

# Print
img_path = 'realsense_rgb.jpg'
depth_path = 'realsense_depth.jpg'

# Padded resize
img = letterbox(img0, new_shape=img_size)[0]

# Convert
img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
img = np.ascontiguousarray(img, dtype=np.float16 if half else np.float32)  # uint8 to fp16/fp32
img /= 255.0  # 0 - 255 to 0.0 - 1.0

return img_path, depth_path, colorized_depth, img, img0, None

""" And this was the result after I run detect.py:

BUGGG

Thanks in advance!

GilbertTjahjono commented 4 years ago

Pardon me, my question is: Is there another way to stream frames from realsense? or perhaps, the only way is to modify datasets.py by introducing new class or functions?

glenn-jocher commented 4 years ago

@GilbertTjahjono this is beyond the scope of the repo, which only supports RGB and grescale training and inference, but of course you should be able to modify the code to suit your needs. In any case, your error is not in a file supplied by this repo but in realsense.py.

Ultralytics offers professional consulting as well, from simple expert advice up to delivery of fully customized, end-to-end production AI solutions for our clients, such as:

If you'd like to discuss your requirements in greater detail, please contact Glenn Jocher at glenn.jocher@ultralytics.com or visit us at https://contact.ultralytics.com. Alternatively, we will leave this issue open, and you are free to seek any available community support here.

yonglizhong commented 4 years ago

HI @glenn-jocher, do you think it's possible to reopen this issue? I have attempted to use the class LoadWebcam() again and still receive the same issue on my cv2 window display. Thanks Screenshot from 2020-02-05 13-27-33

yonglizhong commented 4 years ago

@GilbertTjahjono was there success on your part with working with the realsense camera? I'm currently attempting to access to the camera stream via ROS

yonglizhong commented 4 years ago

Hi @glenn-jocher, I've just switched to a simple USB camera to attempt the LoadWebcam() and still received this result. Wondering if you maybe have an idea what's going on? could it be with the way the letterbox() function works on the webcam stream itself? thanks Screenshot from 2020-02-06 09-41-41

yonglizhong commented 4 years ago

As advised, I have only done a very minor change to the detect.py by just calling LoadWebcam() class Screenshot from 2020-02-06 09-45-24

glenn-jocher commented 4 years ago

@yonglizhong try an IP or RTSP cam to see if the issue is isolated to your webcam. See https://github.com/ultralytics/yolov3#inference

GilbertTjahjono commented 4 years ago

@yonglizhong , actually my project for bachelor thesis is still ongoing. Hence, I haven't created a proper github repository for documentation purposes. However, recently I just succeeded to stream RGB and depth data from Intel RealSense D435.

contoh github

I developed an extra class in utils.dataset (thanks to @glenn-jocher 's repository) named LoadRealSense2 to stream RGB and depth frames. The class is as follows, pardon me in advance if there are still comments in the Indonesian language, you can just ask me for further explanation :

class LoadRealSense2:  # Stream from Intel RealSense D435

    def __init__(self, width='640', height='480', fps='30'):

        # Variabels for setup
        self.width = width
        self.height = height
        self.fps = fps
        self.imgs = [None]
        self.depths = [None]
        self.img_size = 416
        self.half = False

        # Setup
        self.pipe = rs.pipeline()
        self.cfg = rs.config()
        self.cfg.enable_stream(rs.stream.depth, self.width, self.height, rs.format.z16, self.fps)
        self.cfg.enable_stream(rs.stream.color, self.width, self.height, rs.format.bgr8, self.fps)

        # Start streaming
        self.profile = self.pipe.start(self.cfg)
        self.path = rs.pipeline_profile()
        print(self.path)

        print("streaming at w = " + str(self.width) + " h = " + str(self.height) + " fps = " + str(self.fps))

    def update(self):

        while True:
            #Wait for frames and get the data
            self.frames = self.pipe.wait_for_frames()
            self.depth_frame = self.frames.get_depth_frame()
            self.color_frame = self.frames.get_color_frame()

            #Wait until RGB and depth frames are synchronised
            if not self.depth_frame or not self.color_frame:
                continue
            #get RGB data and convert it to numpy array
            img0 = np.asanyarray(self.color_frame.get_data())
            #print("ini image awal: " + str(np.shape(img0)))

            #align + color depth -> for display purpose only
            #udah di convert ke numpy array di def colorizing
            depth0 = self.colorizing(self.aligned(self.frames))

            # aligned depth -> for depth calculation
            # udah di convert ke numpy array di def kedalaman
            distance0 = self.kedalaman(self.frames)

            #get depth_scale
            depth_scale = self.scale(self.profile)

            #Expand dimensi image biar jadi 4 dimensi (biar bisa masuk ke fungsi letterbox)
            self.imgs = np.expand_dims(img0, axis=0)
            #print("ini img expand: " + str(np.shape(self.imgs)))

            #Kalo yang depth gaperlu, karena gaakan dimasukin ke YOLO
            self.depths = depth0
            self.distance = distance0
            break

        #print("ini depth awal: " + str(np.shape(self.depths)))

        s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0)  # inference shapes
        #print("ini s: " + str(np.shape(s)))

        self.rect = np.unique(s, axis=0).shape[0] == 1
        #print("ini rect: " + str(np.shape(self.rect)))

        if not self.rect:
            print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

        time.sleep(0.01)  # wait time
        return self.rect, depth_scale

    def scale(self, profile):
        depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
        return depth_scale

    def kedalaman(self, frames):
        self.align = rs.align(rs.stream.color)
        frames = self.align.process(frames)
        aligned_depth_frame = frames.get_depth_frame()
        depth_real = np.asanyarray(aligned_depth_frame.get_data())
        return depth_real

    def aligned(self, frames):
        self.align = rs.align(rs.stream.color)
        frames = self.align.process(frames)
        aligned_depth_frame = frames.get_depth_frame()
        return aligned_depth_frame

    def colorizing(self, aligned_depth_frame):
        self.colorizer = rs.colorizer()
        colorized_depth = np.asanyarray(self.colorizer.colorize(aligned_depth_frame).get_data())
        return(colorized_depth)

    def __iter__(self):
        self.count = -1
        return self

    def __next__(self):
        self.count += 1
        self.rect, depth_scale = self.update()
        img0 = self.imgs.copy()
        depth = self.depths.copy()
        distance = self.distance.copy()
        if cv2.waitKey(1) == ord('q'):  # q to quit
            cv2.destroyAllWindows()
            raise StopIteration

        img_path = 'realsense.jpg'

        # Letterbox
        img = [letterbox(x, new_shape=self.img_size, auto=self.rect, interp=cv2.INTER_LINEAR)[0] for x in img0]
        #print("ini img letterbox: " + str(np.shape(img)))

        # Stack
        img = np.stack(img, 0)
        #print("ini img-padding: " + str(np.shape(img)))

        # Convert Image
        img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR to RGB, to 3x416x416, uint8 to float32
        #print("ini img-RGB: " + str(np.shape(depth)))
        img = np.ascontiguousarray(img, dtype=np.float16 if self.half else np.float32)
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        #print("ini img-final: " + str(np.shape(img)))

        # Return depth, depth0, img, img0
        return str(img_path), depth, distance, depth_scale, img, img0, None

    def __len__(self):
        return 0  # 1E12 frames = 32 streams at 30 FPS for 30 years
yonglizhong commented 4 years ago

@GilbertTjahjono Hey neighbor! I'm from Malaysia! Anyway, thanks to your code I was able to edit it to the ROS class. Here's the class I've modified from yours and it works by calling the ros topic

`class LoadRealSense: # Stream from Intel RealSense D435

def __init__(self, width='640', height='480', fps='30'):

    # Variables for setup
    self.width = width
    self.height = height
    self.fps = fps
    self.imgs = np.zeros((480, 640, 3))
    self.img0 = np.zeros((480, 640, 3))
    self.img_size = 416
    self.half = False
    self.updated_image = False
    self.bridge = CvBridge()

    self.color_sub = rospy.Subscriber("/camera/color/image_raw", Image, self.callback)

    print("streaming at w = " + str(self.width) + " h = " + str(self.height) + " fps = " + str(self.fps))

def callback(self, data):

    try:
        self.img0 = self.bridge.imgmsg_to_cv2(data, "bgr8")
        # print(type(self.color_image))
        self.updated_image = True
    except CvBridgeError as e:
        print(e)

def __iter__(self):
    self.count = -1
    return self

def __next__(self):
    self.count += 1
    # self.rect = self.update()
    # while not self.updated_image:
    self.imgs = np.expand_dims(self.img0, axis=0)
    self.updated_image = True
    # print("ini img expand: " + str(np.shape(self.imgs)))

    s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0)  # inference shapes

    self.rect = np.unique(s, axis=0).shape[0] == 1

    if not self.rect:
        print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

    time.sleep(0.01)  # wait time
    img0 = self.imgs.copy()

    if cv2.waitKey(1) == ord('q'):  # q to quit
        cv2.destroyAllWindows()
        raise StopIteration

    img_path = 'realsense.jpg'

    # Letterbox
    img = [letterbox(x, new_shape=self.img_size, interp=cv2.INTER_LINEAR)[0] for x in img0]

    # Stack
    img = np.stack(img, 0)

    # Convert Image
    img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR to RGB, to 3x416x416, uint8 to float32
    img = np.ascontiguousarray(img, dtype=np.float16 if self.half else np.float32)
    img /= 255.0  # 0 - 255 to 0.0 - 1.0

    # Return depth, depth0, img, img0
    return str(img_path), img, img0, None

def __len__(self):
    return 0  # 1E12 frames = 32 streams at 30 FPS for 30 years

`

Sammysedo commented 4 years ago

@yonglizhong

        # Return depth, depth0, img, img0
        return str(img_path), depth, distance, depth_scale, img, img0, None

Could you provide an example how you performed the distance calculation in detect.py using these variables? I'm working on a similar project with the same camera.

yonglizhong commented 4 years ago

@Sammysedo HI sammy, apologies for the late reply. Unfortunately I no longer have the camera with me to test it out. My main priority was to obtain the color stream anyway with a separate script for depth. Maybe ask @GilbertTjahjono ? The code is after all, written by him. Sorry for unable to be helpful to you.

Sammysedo commented 4 years ago

@yonglizhong I have resolved my issue. What I needed were 3D coordinates in respect to the camera's optical frame, which I was able to obtain using rs.rs2_deproject_pixel_to_point() function block:

depth_profile = rs.video_stream_profile(self.profile.get_stream(rs.stream.depth)) depth_intrin = depth_profile.get_intrinsics()

dist = rs.depth_frame.get_distance(depth, int(cx), int(cy))

metric_coord = rs.rs2_deproject_pixel_to_point(depth_intrin,[cx,cy],dist)

GilbertTjahjono commented 4 years ago

@Sammysedo @yonglizhong Sorry for the late reply.

Yes, my code is similar to what @Sammysedo has written above. I further process the 3D Coordinates using Kalman Filter and return additional states of 3D velocity and acceleration.

GilbertTjahjono commented 4 years ago

I forgot where I found the reference link... I think it is somewhere in the intel realsense's example code. Here's my code for calculating the distance:

#Calculating Depth
def calcdepth(xmin, ymin, xmax, ymax, depth, depth_scale):
    # Crop depth data
    depth = depth[ymin:ymax,xmin:xmax].astype(float)
    # Get data scale from the device and convert to meters
    depth = depth * depth_scale
    dist = cv2.mean(depth)[0]
    #print(str(dist) + " meters")
    return dist

It calculates the distance by averaging the values in a cropped depth frame... The cropped depth frame is obtained by 'copying' the bounding box from the RGB frame to the depth frame.

However, this method includes a few background pixels in the calculation, since the shape of the object is not completely fit with the bounding box. Hence, the calculated distance is not accurate.

joelbudu commented 4 years ago

@GilbertTjahjono very good work here. Do you mind sharing your repository for streaming RGB and depth data from RealSense camera? Also how you're feeding the stream to the yolo model as well would be a good reference as there's not much documentation on it

GilbertTjahjono commented 4 years ago

Hi @joelbudu. I'm planning to make it public next week since it is still a mess right now ... I've been busy working on my final thesis defense next week and haven't had time to check my code.

I'll let you know as soon as possible! Cheers!

joelbudu commented 4 years ago

Thanks @GilbertTjahjono . I actually got it working yesterday for yolov5 so I might upload a fork when I get the chance. I assume you'll have a very much improved version at this point so would be waiting for your update. Cheers!

AinaraLopez commented 3 years ago

@GilbertTjahjono I have tested your code and it keeps giving me the error that ati gave you previously, you have changed some function in detect.py

GilbertTjahjono commented 3 years ago

@AinaraLopez have you changed this part on detect.py?

image

You should change it according to the name of your streaming function, or maybe you can add a new parser to stream data from Intel RealSense D435.

AinaraLopez commented 3 years ago

@GilbertTjahjono yes

vid_path, vid_writer = None, None if webcam: view_img = True cudnn.benchmark = True # set True to speed up constant image size inference dataset = LoadRealSense2(width=640, height=480, fps=30) else: save_img = True dataset = LoadImages(source, img_size=imgsz)

This is the error: Captura de pantalla de 2020-11-17 15-04-07

GilbertTjahjono commented 3 years ago

@AinaraLopez ah, it is because your loop expects 3 variables to unpack yet the function returns 4 variables:

image

you should adjust the returned variable and return only the variable needed for further processing in detect.py. I think it should be okay if you delete "str(img_path)".

I just uploaded my project here: https://github.com/GilbertTjahjono/Multiple_Object_Tracking

The modified detect.py is named tracking_v3.py in my repository. You can check my full code there.

AinaraLopez commented 3 years ago

@GilbertTjahjono Thank you very much I have already been able to solve it

HeitorDC commented 2 years ago

@GilbertTjahjono Thank you very much I have already been able to solve it

@AinaraLopez can you show me how did you solve it? Can you show me your code, please?