w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 15 forks source link

Origin isolation #19

Open martinthomson opened 6 years ago

martinthomson commented 6 years ago

I couldn't find anything in the specification regarding the origin that a track is attributed to. I suspect that all browsers have settled on a model that is sensible, but the spec should make a few things clear:

  1. MediaStreamTrack objects are only readable by the origin that requested them, unless other constraints cause them to gain a different origin (the peerIdentity constraint for WebRTC does this).

  2. MediaStreamTrack objects can be rendered if they belong to another origin, but only their size is known.

  3. We need to decide what the rules are for constraints on cross origin tracks. I think that if the model for transferrance is that they are copied when transferred, then constraints can be both read and written, just as we permit a site to read and write constraints on peerIdentity-constrained tracks.

  4. We need to consider what happens to synchronization of playback for mixed-origin MediaStreamTrack objects. Do we consider clock skew from a particular source to be something that we should protect? Whatever the decision, this is part of the set of things that we need to be very clear on.

Work progresses on transferring tracks between origins, which I think is OK, but this is groundwork for that.

The best text we have is in the from-element spec, which is honestly a little on the light side.

This came up in w3c/mediacapture-screen-share#53.

alvestrand commented 6 years ago

making sure I understand what you're proposing:

  1. MediaStreamTrack objects are only readable by the origin that requested them, unless other constraints cause them to gain a different origin (the peerIdentity constraint for WebRTC does this).

when you say "readable", is this shorthand for "copy media content into another object"?

  1. MediaStreamTrack objects can be rendered if they belong to another origin, but only their size is known. "rendered" = "used for display in a

"known" = "readable by JS from the current origin"? Size is if course not known for audio tracks. Is the frame rate (for video) or sample rate (for video) knowable properties?

Clarifying this seems to make a lot of sense, yes.

martinthomson commented 6 years ago

Readable means that the contents (the frames of video, samples of audio) can be read somehow. That might mean rendering to a canvas, web audio, or whatever other methods we provide for accessing content.

Copying doesn't imply a requirement that content be readable - we can create new tracks by cloning for instance, or we might copy the track to another origin (where it is probably equally unreadable), and we might consider rendering through a video tag as copying as well.

Yes, "rendered" means playback. Just as we allow cross-origin images to be shown by a page, we can allow media to be played in the various ways we support (which includes capture of stills in canvas, which subsequently loses its origin-clean flag).

Size (meaning video width and height) is one property we might allow read access to along with frame rate and sample rate. We might also permit reading of other metadata like field of view (if this is taken from a camera), the number of bits per sample, and other such things. Enumerating those, or finding a set of rules that we might apply to the decision, would be good.

stefhak commented 6 years ago

+1 on the importance of clarifying this. Three questions:

  1. Where does the recorder belong in this? Is it in the same class as "renderer" or "readable"?
  2. Same question for a PeerConnection.
  3. What happens on the receiving end of a PeerConnection? Does that page "own" the origin of the track (and can read data)? Can it forward the track?
martinthomson commented 6 years ago

Recorder and PC require that the track be readable. The track out of PC is also readable.

The peerIdentity constraint is an exception to that. It exists to allow inaccessible media to be sent via PC and does that by making it unreadable on the receiving side.

stefhak commented 6 years ago

https://github.com/w3c/mediacapture-main/issues/529#issuecomment-412447137 makes sense to me.

jan-ivar commented 3 years ago

(Going over old issues...)

  1. MediaStreamTrack objects are only readable by the origin that requested them

I don't see any such general principle. Tracks that are not tainted are effectively readable from any origin, since they may be transmitted over a peer connection (MediaStreamTrack isn't transferable, so we don't have to worry about postMessage, but if we were to add that, I think the same principle would apply).

The only source of tainted tracks atm appears to be element.captureStream which in some implementations (Firefox) emits tainted tracks when the element source is cross-origin content, although web compat here appears to have gone to shreds. See https://github.com/w3c/mediacapture-fromelement/issues/83 and https://github.com/w3c/mediacapture-fromelement/issues/21. For some reason canvas.captureStream chose to fail rather than emit tainted tracks.

IOW there's effectively no origin concept for camera, microphone, screen-sharing or web-audio tracks.

There might be some benefit from a central definition for the concept of a tainted track, since every sink in the MediaStreamTrack model of sources and sinks is going to have to deal with it (I've already filed https://github.com/w3c/mediacapture-image/issues/272).

martinthomson commented 3 years ago

I think that your notion of tainting isn't sitting right with me. What happens is that media has an origin and often - but not always - that origin is the same as the page. If the two are not the same, then the media is unreadable. This concept exists already for (non-CORS) images and video. It exists so that media can be obtained, managed, and rendered, while ensuring that cross-origin content cannot be read by pages that show that content.

canvas.captureStream fails because the origin-clean status is irrevocable. That isn't entirely justified, because you could still at least render that content to screen, so failing is probably too hard. I probably would have had canvas not fail if it weren't for compatibility with existing implementations and the existing functions for accessing canvas content, which fail immediately also. Of course, content from a non-origin-clean canvas still can't be consumed by anything else, so the utility of such a stream is marginal.

Other sources of media could - at least in theory - change origin and later become readable to someone, so they provide a stream. Take the isolated peerIdentity media as an example: that isn't readable to this page, but it might be readable to PeerConnection. Or video elements, which can change origin as the media can be sourced from different servers over time.

youennf commented 3 years ago

There might be some benefit from a central definition for the concept of a tainted track

For simplicity, I would prefer we stick with muted/ended notions in specs. Adding another tainted notion adds extra complexity. muted/ended also have the nice property to be exposed to web pages.

I would be ok with a tainted notion if we have good enough use cases. So far, it seems that:

Other sources of media could - at least in theory - change origin and later become readable to someone, so they provide a stream.

muted could well handle that.

youennf commented 3 years ago

I propose we close this issue for now. It seems that to make progress here, we could:

@martinthomson, does that sound ok?

martinthomson commented 3 years ago

Is "media capture extension" this specification?

jan-ivar commented 3 years ago

Is "media capture extension" this specification?

No, it's https://github.com/w3c/mediacapture-extensions, which this issue has not been transferred to at the time of writing.

I propose we close this issue for now.

I do worry though that for security reasons, we need to say something about tainting in this spec, since this is where the model of sources and sinks is established, and we have sources that may taint in https://github.com/w3c/mediacapture-fromelement/issues/83. Otherwise, it falls on every sink defined to not trip over this. We want to avoid another cr761622.

At minimum I think we need to say that any spec that defines a sink for MediaStreamTracks MUST protect any cross-origin media in it from being exposed to JS, or failing that, decree that MediaStreamTrack sources MUST NOT contain cross-origin media, tainted or otherwise. Leaving individual sources and sinks to coordinate every combination of this seems subpar.

jan-ivar commented 3 years ago

This would also come up with insertable streams.

jan-ivar commented 3 years ago

Forever banning tainted MediaStreamTracks seems harsh.

How about something like: "If the User Agent supports tainted MediaStreamTracks, then all sinks of MediaStreamTrack MUST protect the data of cross-origin media in said tracks from being exposed to JS." ?

youennf commented 3 years ago

We should agree on whether there is enough justification to introduce the tainted track concept here. So far, I have not seen such justification. For instance, it is not clear to me we want the tainted track concept to spread elsewhere than mediacapture-fromelement.

One possibility would be to state in mediacapture-fromelement that 'tainted tracks' can only be consumed in HTMLMediaElement sinks (and HTMLMediaElement must preserve this tainting). For any other sink, tainted tracks are to be treated as if they were muted.

jan-ivar commented 3 years ago

For any other sink, tainted tracks are to be treated as if they were muted.

Something like that is what I think we need to state in mediacapture-main, the only spec any new sink should need to normatively reference.

An implementer of a new sink shouldn't need to read the spec of every source.

youennf commented 3 years ago

This was discussed at last interim. My understanding is that there is little appetite to introduce a 'tainted' track concept until there is more usecases than the mediacapture-fromelement one.