wish-wg / webrtc-http-egress-protocol

WHEP - WebRTC HTTP egress protocol draft
5 stars 4 forks source link

Revisit server-sent offers #12

Open englishm-ietf opened 1 year ago

englishm-ietf commented 1 year ago

Picking up from discussion started in a video-dev Slack thread.

I think it's worth giving server-sent offers another consideration and maybe making that the one supported WHEP behavior rather than the other way around.

Discussion so far:

englishm The more I think about this the more I think I prefer server offers… :thinking_face:

englishm What if we shuffled the HTTP verbs around to match, too?

Sergio Garcia Murillo server sent offers in WHEP have a problem with h264 profiles

englishm Because you can’t simultaneously offer all possibilities a client might need to choose from?

Sergio Garcia Murillo yep

englishm :disappointed:

Sergio Garcia Murillo same with vp9/av1 profiles

englishm I haven’t tried this yet, but could you work around it by just offering more tracks with the various permutations of fmtps?

Sergio Garcia Murillo tracks as "m-lines" ?

englishm m= lines, yes

Sergio Garcia Murillo i would just make things worse imo.. :slightly_smiling_face:

englishm Wait, no, you wouldn’t necessarily need more m= lines.. could you use dynamic format numbers and just define more “codecs” with the different format parameters using rtpmap to define them? Is it a problem if you define multiple payload numbers to all mean various versions of H264/90000 (or whatever)?

englishm I’m pushing on this partly because I think server offers bring us a lot closer to the types of things DASH is looking for as an upfront description of available media. It seems backwards to have to duplicate that with something else because we have clients making recvonly offers instead.

Sergio Garcia Murillo my main concern about server sent offers, is that in my case, I allow connecting viewers before the publication is started, so I don't know the actual codec that is going to be used

Sergio Garcia Murillo so there is a huge risk that the client receiving a server sent offer will "choose" one of the codecs on the sdp offer, and send an answer without the other ones.

englishm But you’d have to know by the time you answer and actually get ICE all connected, right?

Sergio Garcia Murillo So it may be choosing h264, when the actual codec is vp9 later on

Sergio Garcia Murillo no, you don't need

Sergio Garcia Murillo in fact, i allow changing codecs mid session. I am even doing different codecs for different ABR layers.

englishm Hm, so in your case the issue is that allowing the client to answer gives them too much control over the size of the envelope?

Sergio Garcia Murillo yes, i think it is too risky for me

englishm What about over-constrained client offers though, is that not the same risk?

Sergio Garcia Murillo I can't do much in that regards, but I think devs would be more prone to restrict the answer based on the codecs offered by the server than send a constrained offer without codecs which are actually supported

englishm Re-reading RFC8829 which is how W3C defines normative behavior for createAnswer… from RFC8829 section-5.3.1: If codec preferences have been set for the associated transceiver, media formats MUST be generated in the corresponding order, regardless of what was offered, and MUST exclude any codecs not present in the codec preferences. Otherwise, the media formats on the “m=” line MUST be generated in the same order as those offered in the current remote description, excluding any currently unsupported formats. Any currently available media formats that are not present in the current remote description MUST be added after all existing formats. In either case, the media formats in the answer MUST include at least one format that is present in the offer but MAY include formats that are locally supported but not present in the offer, as mentioned in [RFC3264], Section 6.1. If no common format exists, the “m=” section is rejected as described above.

englishm To me, that sounds like browser implementations should be answering with all supported codecs unless specific codec preferences have been configured.

englishm And we can provide explicit recommendations in the WHEP text for non-browser implementations to do the same.

englishm Here’s the gist of what I’m imagining: https://gist.github.com/englishm-ietf/48cbab582f8a748d8ebab0b2c47c9d5c

Sergio Garcia Murillo GET requests should be idempotent and not cause state changes on the server side, but anyway it won't solve my issues with server sent offers

englishm I think we don’t actually need to make state changes on the server until we get the answer though, is the realization I had.

Sergio Garcia Murillo you have to allocate a new ice username/frag and create the candidates

englishm Depends a little maybe on how you send ICE candidates, but I think the initial offer could maybe be generic. I guess we’d want to be careful about header caching for the session resource, too.

Sergio Garcia Murillo but anyway, it is minor, using GET or empty POST is not an issue for me

englishm You still think client answers will be overconstrained?

Sergio Garcia Murillo I have already seen it in an early gstreamer implementation.. :slightly_smiling_face:

englishm But speaking of issues, maybe we should copy our discussion so far to a GitHub issue and pick it up there? I just realized that this isn’t the best medium for IETF discussion.

Sergio Garcia Murillo i think it my be less risky to allow sever sent offers on whip instead

redoPop commented 1 year ago

From the quoted section of RFC8829:

media formats in the answer MUST include at least one format that is present in the offer

I think the phrasing "at least one" (i.e. not necessarily more than one) supports the overconstrained answer scenario. 😞

Would a smaller, non-SDP response satisfy this use case? e.g. a JSON payload broadly describing tracks and kinds, that the client can then use to create an appropriate offer?

englishm-ietf commented 1 year ago

I could see some implementations choosing to interpret RFC8829 that way, but I think that would be a mistake. In the section I quoted above it seems to me to be saying that every supported media format should be present in the answer, and the order should match what's in the offer, after which additional formats not present in the offer are to be listed.

the media formats on the “m=” line MUST be generated in the same order as those offered in the current remote description, excluding any currently unsupported formats. Any currently available media formats that are not present in the current remote description MUST be added after all existing formats.

The fact that currently available media formats MUST be added after the existing formats implies to me that available media formats listed in the offer should also be comprehensively represented.

Whether all implementations actually follow the spec here is something worth exploring, but this is what the text says, and I think at least browser implementations should be following it correctly.

The exception to these requirements is only if codec preferences have been set for the associated transceiver, but I don't think that should be dependent on the contents of the offer, so clients would have to fail in either case there.

Also, for what it's worth, I don't love the idea of adding a JSON payload and another round trip here. That seems like an unnecessary delay in startup time we can probably avoid.

redoPop commented 1 year ago

I agree that compliance is worth exploring here. If the problematic behavior is non-compliant then that makes a stronger case for reconsidering server-sent offers.

That said, I get what you're saying, but I don't think it's a necessary interpretation of the spec, or the intended one. The text you quoted applies specifically when adding new formats that weren't in the original offer. The broader implication that you're drawing conflicts with the more generally applicable "In either case… at least one format" phrasing in the following bullet.

It does seem compliant for an answer to include only one of the existing available media formats, whether or not codec preferences have been set. If the client were adding formats at the same time then a stronger argument could be made for non-compliance, but that isn't the specific scenario that drew concern.

danjenkins commented 1 year ago

Just here to add that I think this 100% needs to be solved, we cannot lose the ability to do whep -> whip - it will open up so many possibilities that weren't available before.

mondain commented 1 year ago

Here's my two cents about the way I currently do what I'd refer to as WHEP Mode 1: https://gist.github.com/mondain/5bc8bee11af4b291abe154b39879e822

Obviously its simplified and I normally live in SFU/MCU world where we keep WHIP and WHEP; publishers and subscribers separate.

I WHIP'd this up real quick as well: https://gist.github.com/mondain/7a3792711c489e97e8cede9e5acbef50

danjenkins commented 6 months ago

Coming back to an issue thats been open for a year and we're still in a position where existing WebRTC media servers can't support WHEP without re-architecting their existing WebRTC solution which wasn't the aim of WHEP. WHEP was meant to be a signalling layer on top of WebRTC and WebRTC didn't define who offers what and where... yes a signalling layer can... but I believe in this case it shouldn't stop the scenario of server sent offers.

Ultimately receiving media is a more complicated scenario than sending - the sender knows what is being sent and it agrees that with a server and off you go. With receiving media you have loads of permutations of what is possible; are you receiving media for multiple participants? Are you receiving multiple qualities? The media server knows the state of those things.... the client doesn't (in a very simple example)

Regardless of the above... unless WHEP supports server sent offers we are at risk of not having broad compatibility across existing WebRTC media servers. I understand a large reason for wanting client side offers is to be able to handle codec profiles... but theres no reason I can see as to why server side offers need to disappear... Receiving media is complicated - for something thats built on top of existing solutions... we need flexibility.

Would removing "WHIP -> WHEP" compatibility as a "wish" (see what I did there) help? I appreciate it was a lofty goal. Maybe thats something that could be looked into separately to this.

I would like to see both modes remain - I really don't see the harm in keeping both. Edit: Make the spec allow either mode and make it so you can call an OPTIONS request or something that will tell the client what mode a server supports. Make it so a client can only support one mode if it wants to. Yes you could get into a scenario where a client can't talk to a server, but at least this allows certain clients/products to only have to support their one preferred method.