w3c / clipboard-apis

Clipboard API and events
https://w3c.github.io/clipboard-apis/
Other
151 stars 36 forks source link

Write UTF-8 data to the clipboard. #217

Open snianu opened 5 months ago

snianu commented 5 months ago

Popular native apps on Windows read formats like image/svg+xml in UTF-8 form [1]. In the spec, before the payload for a format gets written to the clipboard, the content is converted from UTF-8 into scalar values. Spec text "Let payload be the result of UTF-8 decoding item’s underlying byte sequence."(https://w3c.github.io/clipboard-apis/#write-blobs-and-option-to-the-clipboard). Should this text be changed to write UTF-8 encoded data directly to the clipboard?

[1] https://docs.google.com/document/d/1ULlihA0FOJOqcyD9MgzLZrAbk0uTQPJqDPuPJ2aiuS4/edit?usp=sharing

snianu commented 5 months ago

@sanketj @whsieh @EdgarChen

annevk commented 5 months ago

Why do we decode at all if the payload is a blob? Like how does this make sense for image/png?

css-meeting-bot commented 5 months ago

The Web Editing Working Group just discussed Write UTF-8 data to the clipboard., and agreed to the following:

The full IRC log of that discussion <dandclark> topic: Write UTF-8 data to the clipboard.
<dandclark> github: https://github.com/w3c/clipboard-apis/issues/217
<dandclark> snianu: Recently we found in Chromium that when we copy svg (chromium supports img/svg), we switch encoding from utf-8 to utf-16
<dandclark> ...: When we paste in native apps like Word, the image doesn't render
<dandclark> ...: It's because the native apps expect utf-8
<dandclark> ...: We investigated, found in the spec that when we write blobs to system clipboard, spec says use utf-8 decoder, write scalar values to system clipboard
<dandclark> ...: Trying to get feedback on whether to change the spec
<dandclark> ...: Or are there corner cases we're missing like for PNG
<dandclark> smaug: I think what Anne noticed is a clear bug
<dandclark> snianu: Is there a specific encoding rule that FF or Safari follow when writing formats? Or is it whatever encoding is in the blob type?
<dandclark> smaug: I can't recall
<dandclark> ...: E.g. if your OS has image-specific backing store you do some additional transformation
<dandclark> snianu: Agree. I read in Apple documentation it's default UTF-16 but can use others
<dandclark> ...: Agree for images it doesn't make sense , for other MIME types like svg and HTML, does it make sense to write UTF-8?
<dandclark> ...: Windows has separate APIs for UTF and ASCII characters
<dandclark> ...: I think there's lots of different cases and encoding schemes
<dandclark> ...: Don't know if makes sense to standardize it
<dandclark> ...: Because it's also platform specific
<dandclark> anne: The one thing you could maybe do is abstract between text and byte sequence types
<dandclark> ...: For text sequence types, always do UTF pass so you always get scalar values
<dandclark> ...: Is interesting question what platforms currently do. If you put zero-bytes in text stream, do you get zero-bytes or replacement chars?
<dandclark> snianu: For the existing spec text, do we all agree it's not valid and we should remove it?
<dandclark> ...: And may be do investigation to see what can be added to the spec, maybe as a note?
<dandclark> anne: Reasonable to remove UTF-8 step and then investigate
<dandclark> smaug: Might be useful to see why we have the UTF-8 thing in the spec
<dandclark> anne: Good to do blame analysis, I didn't yet
<dandclark> smaug: It's very specific, might be something interesting mentioned in spec issue somewhere
<dandclark> johanneswilm: Is there agreement?
<dandclark> johanneswilm: It's always either bytes or UTF-8? Any risk of other older encodings?
<dandclark> anne: It's another interesting question. It's why I think bytes are the answer and we need to investigate further.
<dandclark> johanneswilm: Who will file follow up issue?
<dandclark> snianu: I can
<dandclark> RESOLVED: Remove the bullet about UTF-8 encoding. Anupam to file follow up issue to investigate what happens when you try to send invalid UTF chars though.