Closed yonglizhong closed 4 years ago
@yonglizhong you can try swapping them in detect.py. See the readme for inference options.
Hi, Thanks for your swift reply. I have attempted it with minor success. Still am trying to figure out this problem, when the inference window shows up, Wonder if you have experienced this result before?
It works fine with LoadStreams(). Will of course further update you if I have managed to solve this issue in LoadWebcam().
Great! I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.
Hi, actually right now I'm having similar problems with Intel RealSense D435. I'm new to programming and my background is engineering...
I want to stream both of the RGB and Depth frames using pyrealsense2 package. I tried to modify the utils.datasets by setting up the pipeline for RealSense and I also replaced cv2.VideoCapture with pipe.start(cfg). Below is my latest code to stream frames from realsense:
""" def LoadRealSense(): # for inference
img_size = 416
half = False # half precision fp16 images
# Setup
pipe = rs.pipeline()
cfg = rs.config()
cfg.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
cfg.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
# Start streaming
profile = pipe.start(cfg)
# Read frame
while True:
frames = pipe.wait_for_frames()
depth_frame = frames.get_depth_frame()
color_frame = frames.get_color_frame()
if not depth_frame or not color_frame:
continue
img0 = np.asanyarray(color_frame.get_data())
# Create alignment primitive with color as its target stream:
align = rs.align(rs.stream.color)
frameset = align.process(frames)
# Update color and depth frames:
aligned_depth_frame = frameset.get_depth_frame()
colorizer = rs.colorizer()
colorized_depth = np.asanyarray(colorizer.colorize(aligned_depth_frame).get_data())
break
# Print
img_path = 'realsense_rgb.jpg'
depth_path = 'realsense_depth.jpg'
# Padded resize
img = letterbox(img0, new_shape=img_size)[0]
# Convert
img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416
img = np.ascontiguousarray(img, dtype=np.float16 if half else np.float32) # uint8 to fp16/fp32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
return img_path, depth_path, colorized_depth, img, img0, None
""" And this was the result after I run detect.py:
Thanks in advance!
Pardon me, my question is: Is there another way to stream frames from realsense? or perhaps, the only way is to modify datasets.py by introducing new class or functions?
@GilbertTjahjono this is beyond the scope of the repo, which only supports RGB and grescale training and inference, but of course you should be able to modify the code to suit your needs. In any case, your error is not in a file supplied by this repo but in realsense.py.
Ultralytics offers professional consulting as well, from simple expert advice up to delivery of fully customized, end-to-end production AI solutions for our clients, such as:
If you'd like to discuss your requirements in greater detail, please contact Glenn Jocher at glenn.jocher@ultralytics.com or visit us at https://contact.ultralytics.com. Alternatively, we will leave this issue open, and you are free to seek any available community support here.
HI @glenn-jocher, do you think it's possible to reopen this issue? I have attempted to use the class LoadWebcam() again and still receive the same issue on my cv2 window display. Thanks
@GilbertTjahjono was there success on your part with working with the realsense camera? I'm currently attempting to access to the camera stream via ROS
Hi @glenn-jocher, I've just switched to a simple USB camera to attempt the LoadWebcam() and still received this result. Wondering if you maybe have an idea what's going on? could it be with the way the letterbox() function works on the webcam stream itself? thanks
As advised, I have only done a very minor change to the detect.py by just calling LoadWebcam() class
@yonglizhong try an IP or RTSP cam to see if the issue is isolated to your webcam. See https://github.com/ultralytics/yolov3#inference
@yonglizhong , actually my project for bachelor thesis is still ongoing. Hence, I haven't created a proper github repository for documentation purposes. However, recently I just succeeded to stream RGB and depth data from Intel RealSense D435.
I developed an extra class in utils.dataset (thanks to @glenn-jocher 's repository) named LoadRealSense2 to stream RGB and depth frames. The class is as follows, pardon me in advance if there are still comments in the Indonesian language, you can just ask me for further explanation :
class LoadRealSense2: # Stream from Intel RealSense D435
def __init__(self, width='640', height='480', fps='30'):
# Variabels for setup
self.width = width
self.height = height
self.fps = fps
self.imgs = [None]
self.depths = [None]
self.img_size = 416
self.half = False
# Setup
self.pipe = rs.pipeline()
self.cfg = rs.config()
self.cfg.enable_stream(rs.stream.depth, self.width, self.height, rs.format.z16, self.fps)
self.cfg.enable_stream(rs.stream.color, self.width, self.height, rs.format.bgr8, self.fps)
# Start streaming
self.profile = self.pipe.start(self.cfg)
self.path = rs.pipeline_profile()
print(self.path)
print("streaming at w = " + str(self.width) + " h = " + str(self.height) + " fps = " + str(self.fps))
def update(self):
while True:
#Wait for frames and get the data
self.frames = self.pipe.wait_for_frames()
self.depth_frame = self.frames.get_depth_frame()
self.color_frame = self.frames.get_color_frame()
#Wait until RGB and depth frames are synchronised
if not self.depth_frame or not self.color_frame:
continue
#get RGB data and convert it to numpy array
img0 = np.asanyarray(self.color_frame.get_data())
#print("ini image awal: " + str(np.shape(img0)))
#align + color depth -> for display purpose only
#udah di convert ke numpy array di def colorizing
depth0 = self.colorizing(self.aligned(self.frames))
# aligned depth -> for depth calculation
# udah di convert ke numpy array di def kedalaman
distance0 = self.kedalaman(self.frames)
#get depth_scale
depth_scale = self.scale(self.profile)
#Expand dimensi image biar jadi 4 dimensi (biar bisa masuk ke fungsi letterbox)
self.imgs = np.expand_dims(img0, axis=0)
#print("ini img expand: " + str(np.shape(self.imgs)))
#Kalo yang depth gaperlu, karena gaakan dimasukin ke YOLO
self.depths = depth0
self.distance = distance0
break
#print("ini depth awal: " + str(np.shape(self.depths)))
s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0) # inference shapes
#print("ini s: " + str(np.shape(s)))
self.rect = np.unique(s, axis=0).shape[0] == 1
#print("ini rect: " + str(np.shape(self.rect)))
if not self.rect:
print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')
time.sleep(0.01) # wait time
return self.rect, depth_scale
def scale(self, profile):
depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
return depth_scale
def kedalaman(self, frames):
self.align = rs.align(rs.stream.color)
frames = self.align.process(frames)
aligned_depth_frame = frames.get_depth_frame()
depth_real = np.asanyarray(aligned_depth_frame.get_data())
return depth_real
def aligned(self, frames):
self.align = rs.align(rs.stream.color)
frames = self.align.process(frames)
aligned_depth_frame = frames.get_depth_frame()
return aligned_depth_frame
def colorizing(self, aligned_depth_frame):
self.colorizer = rs.colorizer()
colorized_depth = np.asanyarray(self.colorizer.colorize(aligned_depth_frame).get_data())
return(colorized_depth)
def __iter__(self):
self.count = -1
return self
def __next__(self):
self.count += 1
self.rect, depth_scale = self.update()
img0 = self.imgs.copy()
depth = self.depths.copy()
distance = self.distance.copy()
if cv2.waitKey(1) == ord('q'): # q to quit
cv2.destroyAllWindows()
raise StopIteration
img_path = 'realsense.jpg'
# Letterbox
img = [letterbox(x, new_shape=self.img_size, auto=self.rect, interp=cv2.INTER_LINEAR)[0] for x in img0]
#print("ini img letterbox: " + str(np.shape(img)))
# Stack
img = np.stack(img, 0)
#print("ini img-padding: " + str(np.shape(img)))
# Convert Image
img = img[:, :, :, ::-1].transpose(0, 3, 1, 2) # BGR to RGB, to 3x416x416, uint8 to float32
#print("ini img-RGB: " + str(np.shape(depth)))
img = np.ascontiguousarray(img, dtype=np.float16 if self.half else np.float32)
img /= 255.0 # 0 - 255 to 0.0 - 1.0
#print("ini img-final: " + str(np.shape(img)))
# Return depth, depth0, img, img0
return str(img_path), depth, distance, depth_scale, img, img0, None
def __len__(self):
return 0 # 1E12 frames = 32 streams at 30 FPS for 30 years
@GilbertTjahjono Hey neighbor! I'm from Malaysia! Anyway, thanks to your code I was able to edit it to the ROS class. Here's the class I've modified from yours and it works by calling the ros topic
`class LoadRealSense: # Stream from Intel RealSense D435
def __init__(self, width='640', height='480', fps='30'):
# Variables for setup
self.width = width
self.height = height
self.fps = fps
self.imgs = np.zeros((480, 640, 3))
self.img0 = np.zeros((480, 640, 3))
self.img_size = 416
self.half = False
self.updated_image = False
self.bridge = CvBridge()
self.color_sub = rospy.Subscriber("/camera/color/image_raw", Image, self.callback)
print("streaming at w = " + str(self.width) + " h = " + str(self.height) + " fps = " + str(self.fps))
def callback(self, data):
try:
self.img0 = self.bridge.imgmsg_to_cv2(data, "bgr8")
# print(type(self.color_image))
self.updated_image = True
except CvBridgeError as e:
print(e)
def __iter__(self):
self.count = -1
return self
def __next__(self):
self.count += 1
# self.rect = self.update()
# while not self.updated_image:
self.imgs = np.expand_dims(self.img0, axis=0)
self.updated_image = True
# print("ini img expand: " + str(np.shape(self.imgs)))
s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0) # inference shapes
self.rect = np.unique(s, axis=0).shape[0] == 1
if not self.rect:
print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')
time.sleep(0.01) # wait time
img0 = self.imgs.copy()
if cv2.waitKey(1) == ord('q'): # q to quit
cv2.destroyAllWindows()
raise StopIteration
img_path = 'realsense.jpg'
# Letterbox
img = [letterbox(x, new_shape=self.img_size, interp=cv2.INTER_LINEAR)[0] for x in img0]
# Stack
img = np.stack(img, 0)
# Convert Image
img = img[:, :, :, ::-1].transpose(0, 3, 1, 2) # BGR to RGB, to 3x416x416, uint8 to float32
img = np.ascontiguousarray(img, dtype=np.float16 if self.half else np.float32)
img /= 255.0 # 0 - 255 to 0.0 - 1.0
# Return depth, depth0, img, img0
return str(img_path), img, img0, None
def __len__(self):
return 0 # 1E12 frames = 32 streams at 30 FPS for 30 years
`
@yonglizhong
# Return depth, depth0, img, img0 return str(img_path), depth, distance, depth_scale, img, img0, None
Could you provide an example how you performed the distance calculation in detect.py using these variables? I'm working on a similar project with the same camera.
@Sammysedo HI sammy, apologies for the late reply. Unfortunately I no longer have the camera with me to test it out. My main priority was to obtain the color stream anyway with a separate script for depth. Maybe ask @GilbertTjahjono ? The code is after all, written by him. Sorry for unable to be helpful to you.
@yonglizhong I have resolved my issue. What I needed were 3D coordinates in respect to the camera's optical frame, which I was able to obtain using rs.rs2_deproject_pixel_to_point()
function block:
depth_profile = rs.video_stream_profile(self.profile.get_stream(rs.stream.depth)) depth_intrin = depth_profile.get_intrinsics()
dist = rs.depth_frame.get_distance(depth, int(cx), int(cy))
metric_coord = rs.rs2_deproject_pixel_to_point(depth_intrin,[cx,cy],dist)
@Sammysedo @yonglizhong Sorry for the late reply.
Yes, my code is similar to what @Sammysedo has written above. I further process the 3D Coordinates using Kalman Filter and return additional states of 3D velocity and acceleration.
I forgot where I found the reference link... I think it is somewhere in the intel realsense's example code. Here's my code for calculating the distance:
#Calculating Depth
def calcdepth(xmin, ymin, xmax, ymax, depth, depth_scale):
# Crop depth data
depth = depth[ymin:ymax,xmin:xmax].astype(float)
# Get data scale from the device and convert to meters
depth = depth * depth_scale
dist = cv2.mean(depth)[0]
#print(str(dist) + " meters")
return dist
It calculates the distance by averaging the values in a cropped depth frame... The cropped depth frame is obtained by 'copying' the bounding box from the RGB frame to the depth frame.
However, this method includes a few background pixels in the calculation, since the shape of the object is not completely fit with the bounding box. Hence, the calculated distance is not accurate.
@GilbertTjahjono very good work here. Do you mind sharing your repository for streaming RGB and depth data from RealSense camera? Also how you're feeding the stream to the yolo model as well would be a good reference as there's not much documentation on it
Hi @joelbudu. I'm planning to make it public next week since it is still a mess right now ... I've been busy working on my final thesis defense next week and haven't had time to check my code.
I'll let you know as soon as possible! Cheers!
Thanks @GilbertTjahjono . I actually got it working yesterday for yolov5 so I might upload a fork when I get the chance. I assume you'll have a very much improved version at this point so would be waiting for your update. Cheers!
@GilbertTjahjono I have tested your code and it keeps giving me the error that ati gave you previously, you have changed some function in detect.py
@AinaraLopez have you changed this part on detect.py?
You should change it according to the name of your streaming function, or maybe you can add a new parser to stream data from Intel RealSense D435.
@GilbertTjahjono yes
vid_path, vid_writer = None, None if webcam: view_img = True cudnn.benchmark = True # set True to speed up constant image size inference dataset = LoadRealSense2(width=640, height=480, fps=30) else: save_img = True dataset = LoadImages(source, img_size=imgsz)
This is the error:
@AinaraLopez ah, it is because your loop expects 3 variables to unpack yet the function returns 4 variables:
you should adjust the returned variable and return only the variable needed for further processing in detect.py. I think it should be okay if you delete "str(img_path)".
I just uploaded my project here: https://github.com/GilbertTjahjono/Multiple_Object_Tracking
The modified detect.py is named tracking_v3.py in my repository. You can check my full code there.
@GilbertTjahjono Thank you very much I have already been able to solve it
@GilbertTjahjono Thank you very much I have already been able to solve it
@AinaraLopez can you show me how did you solve it? Can you show me your code, please?
Hi, due to me using a realsense camera. I had to use pyrealsense2 package to read the stream RGB data instead of using cv2.Videocapture(). My question is how to implement the class LoadWebcam() in detect.py instead of class LoadStreams(). Thanks