Open iamwyh2019 opened 2 months ago
Hello, I think option 2 could mess up the encoder but option 1 could work. If option 1 does not work by itself, then maybe a variable framerate approach using a semaphore could work, like ReleaseSemaphore is called after send in the SendSample callback, WaitForSingleObject with timeout 0 is called somewhere near the beginning of the FrameReceived callback and if the semaphore is acquired then the rest of the function runs and the frame is sent to the sink writer, otherwise the function returns immediately. I don't know how many frames (if any) the sink writer buffers so selecting the initial/max count for the semaphore may require some experimentation but it should be a small value like 1~16.
Thanks for the suggestion!
Before trying the options I want to locate the specific error, i.e. if it's because the camera framerate drops (unlikely) or the video stream bandwidth drops. To do that I am adding interfaces to accept callback functions from C# side and call when a frame is sent in the C++ side. Where exactly is the part that a frame gets sent? I'm assuming it's in stream_pv.cpp
, bool ok = send_multiple(user->clientsocket, wsaBuf, sizeof(wsaBuf) / sizeof(WSABUF), g_frameSentCallBack)
? Or is the send_multiple
(and specifically WSASend
) function asynchronous and I have to pass a callback to WSASend
? I'm seeing you are not using the overlapping mode so I assume that's not the case. Or I could be wrong in locating the code.
I actually experimented for a while, logged the framerate and bandwidth of the send_multiple
in PV_SendSample
as well as the framerate and bandwidth of the python end receiving new frames like this:
while enable:
stamp, data = self.video_sink.get_most_recent_frame()
if data is not None and stamp != last_stamp:
last_stamp = stamp
self.get_frame_callback(data)
What I found is: the framerate and bandwidth of the PV_SendSample
is consistently around 30 FPS while that of the backend drops to 20 FPS when I'm moving with Hololens. I also tried the Realtime
(option 1) but the problem persists. I think that suggests that WSASend
is not the part the images are physically sent. Instead it buffers the images inside the socket buffer area awaiting to be sent. Is that the case here?
All server sockets are blocking and sending is non-overlapped. All stream data is sent through send_multiple
.
WSASend
should buffer according to this document https://learn.microsoft.com/en-us/previous-versions/troubleshoot/windows/win32/data-segment-tcp-winsock
To optimize performance at the application layer, Winsock copies data buffers from application send calls to a Winsock kernel buffer. Then, the stack uses its own heuristics (such as Nagle algorithm) to determine when to actually put the packet on the wire.
I also tried logging the framerate but got ~30FPS on both ends even when moving (using the Unity server sample). In my case, the HoloLens is always close to the router though. Here is the script for reference:
import multiprocessing as mp
import cv2
import hl2ss
import hl2ss_lnm
import hl2ss_mp
import hl2ss_utilities
# HoloLens address
host = '192.168.1.7'
# Camera parameters
pv_width = 640
pv_height = 360
pv_framerate = 30
# Buffer length in seconds
buffer_length = 10
if __name__ == '__main__':
hl2ss_lnm.start_subsystem_pv(host, hl2ss.StreamPort.PERSONAL_VIDEO)
producer = hl2ss_mp.producer()
producer.configure(hl2ss.StreamPort.PERSONAL_VIDEO, hl2ss_lnm.rx_pv(host, hl2ss.StreamPort.PERSONAL_VIDEO, width=pv_width, height=pv_height, framerate=pv_framerate, decoded_format='bgr24'))
producer.initialize(hl2ss.StreamPort.PERSONAL_VIDEO, pv_framerate * buffer_length)
producer.start(hl2ss.StreamPort.PERSONAL_VIDEO)
consumer = hl2ss_mp.consumer()
manager = mp.Manager()
sink_pv = consumer.create_sink(producer, hl2ss.StreamPort.PERSONAL_VIDEO, manager, None)
frame_stamp = sink_pv.get_attach_response()
cv2.namedWindow('Control')
fps = None
prev_ts = None
delta_ts = 0
sample_time = 1
# Main Loop ---------------------------------------------------------------
while (True):
if ((cv2.waitKey(1) & 0xFF) == 27):
break
'''
state, _, data_pv = sink_pv.get_buffered_frame(frame_stamp)
if (state < 0):
frame_stamp += 1
continue
if (state > 0):
continue
frame_stamp += 1
'''
stamp, data_pv = sink_pv.get_most_recent_frame()
if data_pv is not None and stamp != frame_stamp:
frame_stamp = stamp
else:
continue
if (prev_ts is not None):
delta_ts += data_pv.timestamp - prev_ts
prev_ts = data_pv.timestamp
if (fps is None):
fps = hl2ss_utilities.framerate_counter()
fps.reset()
else:
fps.increment()
if (fps.delta() > sample_time):
print(f'FPS: {fps.get()} / {pv_framerate} DELTA: {delta_ts / hl2ss.TimeBase.HUNDREDS_OF_NANOSECONDS} / {sample_time}')
delta_ts = 0
fps.reset()
cv2.imshow('Control', data_pv.payload.image)
sink_pv.detach()
producer.stop(hl2ss.StreamPort.PERSONAL_VIDEO)
hl2ss_lnm.stop_subsystem_pv(host, hl2ss.StreamPort.PERSONAL_VIDEO)
Thank you for the reference! In my case, I am experimenting in a university department building with multiple internet access points, and my observation is the transmission slows down on the backend when I'm moving in the building, but quickly stablizes once I stand still. The FPS of calling WSASend
is stable though, which can be explained by that this system call does not physically send the data, just buffers them in the socket buffer. I'll experiment more and get back to you.
I experimented a few scenarios and here's my findings:
I changed the plugin to stream over UDP, and now the delay doesn't accumulate any more. I am working on getting some concrete numbers for that, and will update here later. Anyway, great code! Doesn't require a lot of changes to switch to UDP. And if you want I can clean the code and make a pr. Thanks again for helping!
I experimented a few scenarios and here's my findings:
- There are mutliple factors that could influence the latency when walking in a department building, such as switching between access points (routers), distance to the router, other processes in the Unity app, etc. They cause packet delay and loss.
- Since we are streaming the video via TCP, it (1) resends loss packets, and (2) strictly ensures packets are in time order. These leads to the delay accumulating. I initially thought it was the buffering in Winsock, but turns out TCP is "too reliable" in this case regardless of the buffering.
I changed the plugin to stream over UDP, and now the delay doesn't accumulate any more. I am working on getting some concrete numbers for that, and will update here later. Anyway, great code! Doesn't require a lot of changes to switch to UDP. And if you want I can clean the code and make a pr. Thanks again for helping!
Could you share your code? I'm dealing with the same problem and it would be really helpful. Thanks!
Could you share your code? I'm dealing with the same problem and it would be really helpful. Thanks!
It's here: https://github.com/iamwyh2019/hl2ss. You can check stream_pv.cpp
for the changes to the frontend logic, and hl2ss.py
for the backend logic. Now, when calling hl2ss_lnm.rx_pv
, instead of passing a single parameter port
, you pass two parameters control_port
(which transmits command over TCP) and stream_port
(which streams video over UDP). For example, this is how I initialize the producer:
VIDEO_PORT = hl2ss.StreamPort.PERSONAL_VIDEO
VIDEO_UDP_PORT = hl2ss.StreamUDPPort.PERSONAL_VIDEO
producer = hl2ss_mp.producer()
producer.configure(VIDEO_PORT, hl2ss_lnm.rx_pv(host, control_port=VIDEO_PORT, stream_port=VIDEO_UDP_PORT, width=pv_width, height=pv_height, framerate=pv_framerate))
Besides this, I added a few callbacks when a frame is generated and sent via UDP. These callbacks can be exposed in the DLL and registered in C#. Check stream_pv.cpp
for more details.
That's awesome. Thanks for sharing your solution.
Hi Jdibenes, have you ever tried measuring the video streaming delay? I tried measuring the delay of 640x360@30FPS in this way:
GetTickCount64
.GetTickCount64
to compute then round-trip delay and divide by 2. What I got is the delay around 30~50 ms when both sends are connected to my home wifi and close to the router. It kind of seems legit but also sounds too good to be true, so wonder if you have measured it before.
I had the PV camera look at a stopwatch on the PC monitor, then put the PV video window next to the stopwatch, took a screenshot, and compared the time difference. I got a delay of about 270 ms for 1920x1080@30.
Yeah! That's where I got confused. I tried the same thing (but at 640x360@30) and the delay seems to be 110ms. So I used time.time()
to measure the time for unpacking, decoding, and cv2.imshow
(basically everything after unpacker.unpack
), but they only add up to around 12ms, so that's still off by a lot.
Just to confirm, are you replacing the timestamp in PV_OnVideoFrameArrived
or in PV_SendSample
? because PV_SendSample
is after the encoder stuff.
In PV_OnVideoFrameArrived
. It's around line 121 in your current version. I simply changed to pj.timestamp = GetTickCount64()
.
I didn't change pSample->SetSampleTime(timestamp)
but that doesn't seem to matter.
Maybe the difference is the photons-to-PV_OnVideoFrameArrived delay. Might be able to estimate it by comparing the frame timestamp vs the QPC time when PV_OnVideoFrameArrived starts.
Genius idea! Turns out most of the delay comes from Hololens side:
PV_OnVideoFrameArrived
starts vs. frame timestamp: ~31msWSASendTo
: ~40ms (using GetTickCount64
)So that did adds up to around 110ms. Now the optimization work veers towards optimizing the C++ part... any suggestions?
Hi jdibenes, would you mind sharing some details about the CustomMediaSink
? including:
CustomStreamSink
. When is the stream sink created? And when is the pHook
called? I found it took 100 ms from the start of the PV_OnFrameArrived
to PV_SendSample
. Not sure how this callback is scheduled.Hi,
CustomMediaSink
and CustomStreamSink
are just barebones implementations of the IMFMediaSink
and IMFStreamSink
interfaces and their purpose is to intercept encoded frames (IMFSample
) and pass them to a callback function (pHook
), all based on the model presented in https://learn.microsoft.com/en-us/windows/win32/medfound/sink-writer.
The creation and configuration of the Sink Writer and its Media Sink (CustomMediaSink
) are handled in custom_sink_writers.cpp
L78. After that, the Media Sink is managed internally by the Sink Writer and the Media Foundation Library, including creating Stream Sinks (CustomStreamSink
) and calling IMFStreamSink::ProcessSample
(which calls pHook
). The Sink Writer is configured to have one stream in L97 and I think this is where a single instance of CustomStreamSink
is created but I'm not sure. I also have no idea how large the internal buffer size is as the library handles all these details. Finally, the video encoder generates an initial empty frame before the first video frame but I don't know if this translates to a delay of 33 ms (for 30 FPS). Here is more information about Media Sinks: https://learn.microsoft.com/en-us/windows/win32/medfound/media-sinks.
Hi @jdibenes, thanks a lot for creating such a great toolkit! I have a question that I believe is more relevant to this thread. What is the best possible latency that can be achieved when streaming from the HoloLens? We are streaming AHAT via Wi-Fi for optical tracking and are experiencing about 100-120 ms latency, even with the best possible Wi-Fi configuration. @iamwyh2019 mentioned measuring around 100 ms latency, most of which comes from the HoloLens. Is this purely a hardware limitation, or is there a way to reduce it?
Hi @jdibenes, thanks a lot for creating such a great toolkit! I have a question that I believe is more relevant to this thread. What is the best possible latency that can be achieved when streaming from the HoloLens? We are streaming AHAT via Wi-Fi for optical tracking and are experiencing about 100-120 ms latency, even with the best possible Wi-Fi configuration. @iamwyh2019 mentioned measuring around 100 ms latency, most of which comes from the HoloLens. Is this purely a hardware limitation, or is there a way to reduce it?
I'm not sure about AHAT. For RGB camera, there's an inherent system delay (a picture is taken ==> the picture is sent) of around 80 ms. To the best of my knowledge there's no way to fix that, since Windows only allows registering callback and controls the time to call the callback itself. It could be different for AHAT though.
Adding to the system delay is the internet streaming delay, which you can estimate from your package size and bandwidth. In my case each frame is around 2KB and my WiFi is (pretty fast), so ideally each frame has a 2ms latency for transmission. It will change but most time it's really fast (<=10ms).
I am using hl2ss to develop a real-time CV system. I am streaming personal video at 640*360@30FPS. At some point, due to internet disturbance, the streaming seems to slow down to around 21 FPS. In this case, hl2ss seems to buffer previous frames and sent them by time order so that the delay accumulates. In my case, I'm OK with losing a few frames, but I need real-time.
I wonder where I can change to achieve this. I am thinking of two fixes, but not sure which one would work (or neither):
stream_pv.cpp
, instatic void PV_Stream(SOCKET clientsocket)
, changevideoFrameReader.AcquisitionMode(MediaFrameReaderAcquisitionMode::Buffered);
toMediaFrameReaderAcquisitionMode::Realtime
g_pSinkWriter->Flush(g_dwVideoIndex);
when a new frame arrivesAny help would be appreciated!