[Accepted with Revisions] SDL 0048 - Add H.264 over RTP format support for video streaming

theresalech commented 7 years ago

Hello SDL community,

The review of the revised proposal "Add H.264 over RTP format support for video streaming" begins now and runs through May 16, 2017. The original review of "Add H.264 over RTP format support for video streaming" occurred April 4 through April 11, 2017. The proposal is available here:

https://github.com/smartdevicelink/sdl_evolution/blob/master/proposals/0048-H264-over-RTP-support-for-video-streaming.md

Reviews are an important part of the SDL evolution process. All reviews should be sent to the associated Github issue at:

https://github.com/smartdevicelink/sdl_evolution/issues/143

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of SDL. When writing your review, here are some questions you might want to answer in your review:

Is the problem being addressed significant enough to warrant a change to SDL?
Does this proposal fit well with the feel and direction of SDL?
If you have used competitors with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study? Please state explicitly whether you believe that the proposal should be accepted into SDL.

More information about the SDL evolution process is available at

https://github.com/smartdevicelink/sdl_evolution/blob/master/process.md

Thank you, Theresa Lech

Program Manager - Livio theresa@livio.io

tpulatha commented 7 years ago

I have two questions for this:

Did we already see issues during video streaming that would warrant the additional technical debt of adding RTP streaming?
Do the mobile OSes (Android/iOS) have native functions that can create RTP frames? If not, this will decrease performance if the frame structure has to be created manually, which would be a quite negative impact.

shoamano83 commented 7 years ago

Hi Timur,

For question 1,

Yes, while developing our own HMI we found that it was difficult to recover from HMI's temporal overload. HMI could not catch up without timestamp information. Actually we tried to workaround by disposing some received data, but presence of timestamp information is more helpful.

Also, we should be able to reduce touch latency using RTP format. In the case of H.264 byte-stream, a HMI typically needs a H.264 byte-stream parser in front of a decoder. Such parser waits for a start code to detect the end of previous NAL unit. The thing is, the start code is included in the next frame's data, so the parser needs to wait for next frame's data to extract current frame and send it to the decoder. This introduces a delay of one frame - which is about 33 msecs in the case of 30 fps. On the other hand, a RTP packet has Marker bit which indicates the end of an access unit (i.e. a frame). So RTP de-payloader detects the end of an access unit and can send the access unit to the decoder before receiving the next frame's data. It will not generate any delay.

Therefore I believe adding RTP format has more benefits than technical debt.

By the way, as you can see from my proposal the use of RTP format is completely optional. A HMI can simply reject ChangeVideoFormat request and let SDL proxy use ordinal H.264 byte-stream format. So if current HMIs prefer H.264 byte-stream format, I think they can continue to use it.

shoamano83 commented 7 years ago

For question 2,

As far as I know, iOS doesn't have such feature. Android framework has RTP implementation in its libstagefright, but I don't think it is exposed to apps.

Creating RTP packet structure is simple and I believe there is little impact on performance. Basically, it is to split the data and add 2-byte length field plus 12-byte (or 14-byte if the data is split) header in front of them. FYI, you can find Android framework's implementation below. It does not support RFC 4571 framing so 2-byte length field is not included: https://android.googlesource.com/platform/frameworks/av/+/master/media/libstagefright/rtsp/ARTPWriter.cpp#570

Again, the RTP format is optional. So if the impact is really a concern then you can make HMIs choose H.264 byte-stream format.

Thanks!

joeygrover commented 7 years ago

In SDL, the h264 frame (or NAL unit) is not guaranteed to be contained in a single SDL packet. Therefore, just adding the stop tag at the end of the h264frame or NAL unit wouldn't really increase performance as the next h264 packet or NAL unit's start tag would directly follow in 99% of cases.

Have you attempted to build this out on Android? I don't believe any RTP APIs are exposed that would make this simple. They have classes that are built for audio streaming, but the structure doesn't look to support a simple integration for video or using custom input/outputs. We rely on the OS's native APIs to build out the current streams and without that support, I don't see adding RTP as simple.

shoamano83 commented 7 years ago

Hi Joey, thanks for your comment.

In SDL, the h264 frame (or NAL unit) is not guaranteed to be contained in a single SDL packet.

Yes, this is true.

Therefore, just adding the stop tag at the end of the h264frame or NAL unit wouldn't really increase performance as the next h264 packet or NAL unit's start tag would directly follow in 99% of cases.

The next H.264 NAL unit may follow right after current frame. However, it will not arrive immediately. We need to consider that video encoder emit data at some interval. Let's say the encoder is running at constant frame rate of 30fps. It emits some data, and the data is packetized and sent over transport. After 33 msecs, the encoder emits some more data. The data is packetized and sent over transport. I don't really matter if NAL units are split or concatenated on the transport. What I want to point out is that the start code of NAL unit is usually delayed 33 msecs compared to the previous frame's end. So H.264 parser in HMI is often stuck for this period of time.

Have you attempted to build this out on Android? I don't believe any RTP APIs are exposed that would make this simple.

The link I showed in my previous comment was only for reference. You can see how much work you will need to add RTP header. Since the class isn't available from an Android app, I need to implement the same logic in SDL proxy. I agree that RTP can be more complex compared to "raw" H.264 byte-stream, but it is much simpler compared to other container format like MP4 or MPEG-TS. I believe this is the best choice (or best trade-off) to add timestamp information and end-of-access-unit flag to SDL video stream.

Thanks!

joeygrover commented 7 years ago

When the encoder releases the data it is just put into a buffer for the transport to churn away at. The probability that the data is read from the buffer in a fashion that has the end of one NAL unit not directly before the next NAl units start tag is very low. At least that was my experience when investigating the video streaming feature. The delay would only occur if for some reason the encoder took longer than normal to process the next frame, and the transport was able to catch up and deplete the buffer before the next frame was ready to be sent.

The code you shared was in C++. So if we were to implement a custom solution for RTP in the Android library we would either have to do one in Java (slow) or in C++ through JNI bindings. The latter would require us to add NDK support to the library. Or we would have to use a 3rd party library, something like FFMPG which would still require us add NDK support .

While I understand and agree with the overall goal of this proposal, I don't see it as a plausible solution until the mobile platforms can natively support RTP video streaming.

shoamano83 commented 7 years ago

Hi Joey,

Thanks for the valuable comments.

I don't know if I understand your comment correctly, but it sounds like the transport is quite slow and a frame is often queued for more than 33 msecs (or so) so that most of the time it is concatenated with the next frame. Is it right? If so, then I think that's another issue that I need to investigate. Could you share the code that you were looking at? Did you see same issue with sdl_ios? Also, I think it depends on the performance of head unit's transports.

Anyway I don't think it's a good reason to defer or suspend this proposal just because current SDL video implementation already has an issue and we might not see improvements because of it. As you know, low latency is a key for good user-experience of video streaming apps. I think we need to consider every means to reduce the latency.

The code you shared was in C++. So if we were to implement a custom solution for RTP in the Android library we would either have to do one in Java (slow) or in C++ through JNI bindings.

Yeah but we already have protocol and transport implementations in Java, not C++. I don't think adding a RTP packetizer implemented in Java significantly decreases performance. And again, RTP format is optional. You can continue to use H.264 byte-stream format if packetizer's performance is your concern.

By the way, in sdl_ios implementation we use Objective-C. I hope you don't have any concern with it.

Thanks!

joeljfischer commented 7 years ago

@shoamano83 It sounds like you've already written Android & iOS code for this. Is it possible to have that code viewable as part of this proposal so that we can see what is actually proposed, code-wise?

jhludwig commented 7 years ago

@joeljfischer we are working on getting code ready; we do not think it will be ready by next week but we will provide as quickly as we can. We already provide H264 video projection over RTP for one of our customers in a non-SDL implementation, and we are comfortable that it works well on iOS and Android, but we understand we need to provide the code for SDL. We are working to abstract the code out for SDL submission.

PeiLinCN commented 7 years ago

@joeygrover

In SDL, the h264 frame (or NAL unit) is not guaranteed to be contained in a single SDL packet.

That is true. The bad thing is that SDL proxy also breaks the boundary of the H264 frames(I/B/P start/end) while constructing the SDL data packet for transportation. So if SDL needs to adapt for some hardware platform which only provide low level codec API as need input the frame data and type for decoding, that must require to implement the parser to find the H264 frame boundary in the cached data buffer from the SDL data package transportation thread. That would be lots of the CPU loading cost while takes side-effect for whole stream decoding performance and latency accumulation.

Another bad thing, if NO TIMESTAMP info in SDL stream data package, we can't recover anything while some exception found, we can't drop anything and skip some ancient frame (Especially for B frame) while there are lots of data package in the queue. we could not do anything optimization for HMI user experience while we have known the latency happened for some limitation in hardware or some low level transportation, or high CPU usage etc. We can't just assume that SDL is the only active app in HU to take over all the resources for whole system perspective.

@shoamano83 Agree with you to implement this proposal for the RTP and media type negotiation as SDP RFC 4566.

theresalech commented 7 years ago

There are two reasons this proposal has been deferred. The first is that the Project Maintainers would like to review the implementation of this code to see how it will function in native iOS and Android platforms. Xevo will set up a separate technical session with the Steering Committee for this review. The second reason for deferring is that the Steering Committee needs to decide the best strategy for the mobile proxies to query the head unit regarding available features – register app interface or negotiations via RPC? The Steering Committee will make this a topic for discussion at a brainstorm taking place later this week.

theresalech commented 7 years ago

This proposal has been deferred until the technical session to review the implementation of this code has taken place (scheduled for April 27, 2017). The Steering Committee has agreed that a separate proposal will be entered to recommend the best strategy for the mobile proxies to query the head unit regarding available features.

shoamano83 commented 7 years ago

Hi all,

I’ve pushed my RTP packetizer implementations to my GitHub repos. Please find them in below commits: Android: https://github.com/shoamano83/sdl_android/commit/2c3b6d850d9f08aa7fbaba90997a42eed99670df iOS: https://github.com/shoamano83/sdl_ios/commit/1c38d8629b7f4cbea788cd10013db6280e2e205c

I hope we can move the discussion forward with these.

Just FYI, the RTP stream can be received with this GStreamer pipeline:

$ gst-launch-1.0 souphttpsrc location=http://127.0.0.1:5050 ! "application/x-rtp-stream" ! rtpstreamdepay ! "application/x-rtp,media=(string)video,clock-rate=90000,encoding-name=(string)H264" ! rtph264depay ! "video/x-h264, stream-format=(string)avc, alignment=(string)au" ! avdec_h264 ! autovideosink sync=false

This requires GStreamer 1.4 or later for “rtpstreamdepay” element. Here I use “souphttpsrc” instead of “tcpclientsrc” because SDL core implementation adds a (fake) HTTP response header before sending actual video stream.

shoamano83 commented 7 years ago

I tested my RTP implementation with our proprietary sample apps (iOS and Android) to check performance. The sample apps generate some simple pictures using OpenGL and feed them to SDL proxies at 30fps. SDL proxies then generate a video stream of 4Mbps in 800px x 480px resolution.

Comparison of rendering FPS (average of 3 times of 3-minute streaming):

Android

Current implementation (non-RTP)	RTP implementation
N/A (*)	30.10 fps

iOS

Current implementation (non-RTP)	RTP implementation
30.19 fps	30.23 fps

(*) Once I started video streaming, the SDL core docker image immediately began consuming a large amount of memory and eventually RAM was exhausted within 20 seconds or so. So I could not conduct testing. Strange though, I did not see this issue with my RTP stream.

Comparison of CPU usage (refer to attached screenshots):

Android

Current implementation (non-RTP)	RTP implementation
around 39%	around 20%

iOS

Current implementation (non-RTP)	RTP implementation
around 59%	around 64%

From these results, I can say that RTP packetizer does not have performance impacts on FPS.

On Android, the CPU usage decreased. This is because the original implementation uses PipedInput/OutputStream, which consumes a lot of CPU resources (possibly related to https://github.com/smartdevicelink/sdl_android/pull/446). My implementation doesn’t use PipedInput/OutputStream because they cannot convey timestamp information. This made a difference on CPU usage. On iOS, the CPU usage increased. This was an expected result because the packetizer does additional works. I believe this is in acceptable range.

Environment I used: HU: A Ubuntu 16.04 64-bit VirtualBox machine (running on a PC with CPU: Core i5-4590 at 3.30GHz, RAM: 8GB) SDL core: official docker image (smartdevicelink/core:latest) HMI: open-source “sdl_hmi” repository. Note that video playback within HMI is disabled. A GStreamer pipeline is manually executed to receive stream. Android phone: Nexus 6 (Android 5.0.1) iPhone: iPhone 5s (iOS 8.4.1) Transport used: Wi-Fi

Also note that I noticed TCPTransport.java in sdl_android produced a lot of debug logs and gave bad performance, so I removed them before running these tests.

Thanks.

Android non-RTP: cpu_usage_android_rawh264 Android RTP: cpu_usage_android_rtp iOS non-RTP:

iOS RTP:

shoamano83 commented 7 years ago

Hi @joeljfischer,

I remember that you had a concern about memory usage during last meeting. Today I used my iPhone5s to capture it. Please find attached screenshots.

Summary:

Without RTP, the memory usage was 19.8MB and went up as high as 22.7MB
With RTP, the memory usage was 20.0MB and went up as high as 23.1MB

So yes, the memory consumption gets bigger with RTP. But it is less than 1MB and I'm sure it won't make a big difference.

non-RTP: memory usage on iOS non-RTP

RTP: memory usage on iOS RTP

theresalech commented 7 years ago

As decided during the Steering Committee meeting on May 2, Xevo will revise this proposal to be aligned with SDL #157 , so that this proposal can be brought into review during the Steering Committee meeting on May 9 (2017-05-09).

shoamano83 commented 7 years ago

Hi all,

The proposal document is updated to align with proposal 0055 and 0052. Please find it from this PR: https://github.com/smartdevicelink/sdl_evolution/pull/175

Meanwhile, I created an example diagram to clarify the sequence of SetVideoConfig request/response. (i.e. why I need to add a new RPC instead of reusing StartStream request/response.) Please find it below.

sequence_rtp_proposal_ios

The diagram uses an iPhone as an example but behavior of Core and HMI is same for Android phone. I hope this helps for better understanding.

theresalech commented 7 years ago

The review of the revised proposal "Add H.264 over RTP format support for video streaming" begins now and runs through May 16, 2017.

Harshacharan commented 7 years ago

@shoamano83 Hello! Could you explain the role of the following in improving the video streaming robustness: (2) Add supported video formats in HMI's System Capabilities (3) Add video format negotiation in the protocol

shoamano83 commented 7 years ago

Hello @Harshacharan!

(2) and (3) are required to enable RTP format in SDL. They are not directly related to improving robustness of a streaming.

When we use RTP format, it is important that SDL proxy confirms that HMI supports the format before starting video streaming. (Here, I mean that RTP format is optional feature. Not all HMI supports it.) System Capabilities is used to notify to SDL proxy which video format(s) HMI supports.

Also, HMI should be aware in which format SDL proxy will create a video stream. Video format negotiation is added for this. When SDL proxy starts video streaming, it declares the format it is going to use. HMI receives the information and set up its video player accordingly. Note, to keep backward compatibility, old SDL proxy is allowed to start video streaming without format negotiation. In such case, HMI considers that a raw H.264 stream will be sent from SDL proxy.

Hope these answer your questions.

joeygrover commented 7 years ago

There is some duplication of RPCs between this and #176 and I believe #176 should take precedence
I believe additional params need to be added to the start service packet to handle proper negotiation (stream type, resolution, bitrate, etc). If the headunit can't accept the params, it simply NAKs (possibly with reasons in the payload).

shoamano83 commented 7 years ago

Hi @joeygrover , thanks for your comment.

1 -> Yes, the RPC stuff was added to both proposals. 2 -> I agree with you, but in this proposal I focused on video format and did not include another parameters like bitrate or resolution. Adding more params will require more discussion so I prefer doing it in another proposal. Actually, I created a dedicated struct VideoConfig so that we can add some more parameters later on. If the head unit cannot accept it, then yes it will return a Start Service NACK control frame.

joeljfischer commented 7 years ago

@joeygrover So the flow is supposed to be:

App -> Head Unit [GetVideoCapabilities] (or whatever it's called) HU -> App [GetVideoCapabilities Response] App -> HU [StartService w/ params based on GVC] HU -> App [StartServiceACK / NACK]

@shoamano83 I think Joey is saying there is additional discussion going on in #176 regarding changes, so it would be hard to approve this saying something different. We could perhaps approve it and then revise it later with the final #176 decisions?

joeygrover commented 7 years ago

Yes, @joeljfischer, that's the basic flow but to be comprehensive:

App -> HU [RegisterAppInterface Request]
App -> HU [RegisterAppInterface Response w/ HMICapabilities including VideoStreaming]
App -> HU [GetSystemCapability w/ systemCapabilityType = VIDEO_STREAMING]
HU -> App [GetSystemCapability Response w/ VideoStreamingCapability]
App -> HU [StartService w/ params based on GSC]
HU -> App [StartServiceACK / NACK]

I believe #176 should probably include this entire flow and #143 can be accepted with revisions that should reflect what happens with #176.

theresalech commented 7 years ago

The Steering Committee has agreed to accept this proposal with revisions based on the outcome of #176 (how the flow should work, enum versus string format).

theresalech commented 7 years ago

Issues entered: Core Android iOS

smartdevicelink / sdl_evolution