w3c / web-nfc

Web NFC
https://w3c.github.io/web-nfc/
Other
313 stars 69 forks source link

What's the value of `toJSON()`? #366

Closed beaufortfrancois closed 5 years ago

beaufortfrancois commented 5 years ago

Is NDEFRecord.toJSON() doing more than JSON.parse(NDEFRecord.toText())? I guess it doesn't raise an error when payload is not a valid JSON message but I'd like to hear from you what is its value.

  if (record.mediaType === 'application/json') {
    try {
-     const data = record.toJSON();
+     const data = JSON.parse(record.toText());
      if ('someKey' in data) { /* ... */ }
    } catch(e) {
      // Show error
    }
  }

At a certain extent, toArrayBuffer() is all it's needed to get text/json as well.

const decoder = new TextDecoder();
const text = decoder.decode(record.toArrayBuffer());
const data = JSON.parse(text);
// etc.
kenchris commented 5 years ago

toJSON rethrows exceptions. It will call utf+8 decode on bytes and then call json parse

Btw the mime type of json is not "json" or "text/json"

beaufortfrancois commented 5 years ago

@kenchris I've edited my post with your feedback. Thank you!

kenchris commented 5 years ago

It is not really doing more, though it is not doing that using JS calls. It is the same as saying that toText() is just calling toArrayBuffer and then using the TextDecoder... (Though it does a bit more when it's a text record)

jakearchibald commented 5 years ago

There's some precedent here with push bodies.

If you're going to add this, the naming should align with push (and fetch). This is especially important as toJSON already has meaning on the platform.

The benefit of response.json() is the platform can parse it in a streaming manner. That isn't really true here or push, since the amount of data is really small & synchronously available.

If it's common to get the data as text, then a helper method seems reasonable (and that should be named .text()). Then I guess you may as well add the rest for consistency. Although I don't think I'd be arguing for that if push hadn't set precedent.

kenchris commented 5 years ago

It is already implemented. I think it offers a lot of convenience for normal users. Not everyone understand how json and text is encoded and know how to use a TextDecoder etc. And with our API it is very easy to write JSON, so it should definitely be just as easy to read it back

jakearchibald commented 5 years ago

Wait, toJSON() got through TAG review?

jakearchibald commented 5 years ago

https://w3c.github.io/web-nfc/#dfn-convert-ndefrecord-payloaddata-bytes seems really inconsistent with the web platform.

Firstly with naming. Other APIs such as fetch and push use text() and json() rather than toText() and toJSON(). This was done deliberately since toJSON() is kinda like toString() in that it's used as a way to 'cast' an object to a particular type. This means if an NDEFRecord object is processed as part of JSON.stringify(), it'll call the toJSON() methods. This is weird since the return value of NDEFRecord's toJSON() method doesn't return a JSON-compatible representation of itself, which is the intent of the toJSON() method elsewhere on the platform.

Additionally, the conversion reacts to the recordType. If I ask for a record to be parsed as JSON, but the type isn't 'json' or 'opaque', it returns null. This is weird because I've asked for the record to be read as JSON. If it can't be read as JSON, it should throw.

If the developer has indicated something should be interpreted as a type, we don't generally 'succeed' if the other end has declared itself to be a different type. Eg, if I use an <img> to load something that's not an image, it'll fire the error event, even if its content-type indicates it isn't an image (text/javascript for instance).

In some cases we fail if the content declares itself to be another type (eg, we check the content-type for module scripts and css), but will still error.

In fetch and push, if you call .json(), the content will be parsed as json, regardless of the content type of the response. If the content cannot be parsed as JSON, it throws/rejects.

kenchris commented 5 years ago

It was originally called json() etc but @marcoscaceres got us to change it to toJSON as it is not async.

I am fine with throwing instead of returning null

beaufortfrancois commented 5 years ago

Weird... PushEvent.data.json() is not async as well.

marcoscaceres commented 5 years ago

Wait, which color is the bikeshed today? 😂 but seriously, just pick one. I like Jake’s rationale, but I don’t mind what it’s called and definitely don’t want to debate about it.

kenchris commented 5 years ago

Ok settled then... want to do the PR @beaufortfrancois ?

beaufortfrancois commented 5 years ago

Sure!

If I understood correctly, we want to rename toJSON(), toText() and toArrayBuffer() respectively to json(), text(), arrayBuffer().

Now, if we're adding json() just for the sake of consistency with Push, I'd argue we should add blob() but since it doesn't make sense.I'd vote for removing json() and simply keep text() and arrayBuffer(). If people complain that JSON.parse() is too hard during the upcoming Origin Trial, how about adding it back later then?

kenchris commented 5 years ago

people write json directly, so it does make sense to have a .json() method instead of having to manually call JSON.parse(). You could argue that if we have JSON.parse then we dont need to accept json and everyone should just call JSON.stringify but then people will end up doing that for the text record instead of using proper opaque record.

There is so little cost in adding .json() and we already have it implemented and it does offer convenience and makes writing and reading more consistent

jakearchibald commented 5 years ago

You could argue that if we have JSON.parse then we dont need to accept json and everyone should just call JSON.stringify

This is how fetch works.

beaufortfrancois commented 5 years ago

I'm not sure why removing NDEFRecord.json() will empower web developers to use JSON.stringify. They already have 3 ways to push JSON objects today. Unless we provide helpful documentation on why this is bad practise (encoding added, bigger size), I think people are going to use the easy way (#1), regardless of the json() method

// #1
writer.push(JSON.stringify({ key1: "value1" }));

// #2
writer.push({
  records: [
    {
      recordType: "text",
      data: JSON.stringify({ key1: "value1" })
    }
  ]
});

// #3
writer.push({
  records: [
    {
      recordType: "json",
      mediaType: "application/json",
      data: { key1: "value1" }
    }
  ]
});
zolkis commented 5 years ago

FWIW the "json" part is artificial, created by us in the Web NFC. Otherwise both `"opaque" and "json" map to Media type records that have a MIME type and we could just merge them to be "media" (replacing "json" and "opaque"). The simplicity of this is tempting.

I am asking myself, given that we can provide examples for good patterns, is the "json" use case so relevant that we want to keep it as a separate thing (for people coming from NFC side, it would be easier if we simply handled it as a media type).

Maybe "json" and "opaque" add enough semantics that they are useful, but I don't see people would have any problem invoking JSON.parse() and JSON.stringify() if examples/patterns would be given.

beaufortfrancois commented 5 years ago

The more I think about it, the more I'd like us to get rid of JSON, and even other special casing. I feel like Web NFC is trying to be both an API and a library at the same time. So here's my proposal.

I believe great documentation will make this API shine.

Proposed spec changes

Edited: "language" and "encoding" have been added to NDEFRecord Edited: "language" has been changed to "lang"

  1. Merge "json" and "opaque" recordType into new "media" recordType
  2. IDL changes (see below)
  [Exposed=Window]
  interface NDEFRecord {
    constructor(NDEFRecordInit recordInit);

    readonly attribute NDEFRecordType recordType;
    readonly attribute USVString mediaType;
    readonly attribute USVString id;
+   readonly attribute DataView? data;
+   readonly attribute USVString lang;
+   readonly attribute USVString encoding;

-   USVString? text();
-   [NewObject] ArrayBuffer? arrayBuffer();
-   [NewObject] any json();
    sequence<NDEFRecord> toRecords();
  };

  dictionary NDEFRecordInit {
    NDEFRecordType recordType;
    USVString mediaType;
    USVString id;
+   USVString language; // defaults to `document.documentElement.lang` or "en-US" if unset
+   USVString encoding = "utf-8"; // MUST be "utf-8" or "utf-16"

    any data;
  };

Why

JS example

Edited: TextEncoder is now used for "application/json"

async function writeToNfcTag() {
  const encoder = new TextEncoder();
  const writer = new NFCWriter();
  const records = [
    {
      recordType: "text",
      encoding: "utf-16",
      data: Uint16Array.of(98, 111, 110, 106, 111, 117, 114), // "bonjour"
      lang: "fr"
    },
    {
      recordType: "url",
      data: "https://google.com"
    },
    {
      recordType: "media",
      mediaType: "application/json",
      data: encoder.encode(JSON.stringify({ key1: "value1", key2: "value2" }))
    },
    {
      recordType: "media",
      mediaType: "image/png",
      data: await (await fetch("image.png")).arrayBuffer()
    },
    {
      recordType: "android.com:pkg", // Known external type
      data: encoder.encode("com.example")
    },
    {
      recordType: "example.com:a", // Custom external type
      data: Uint8Array.of(1)
    }
  ];

  return writer.push({ records });
}

function readNfcTag() {
  const reader = new NFCReader();
  reader.scan();
  reader.onreading = ({ message }) => {
    const decoder = new TextDecoder(); // UTF-8 by default

    for (const record of message.records) {
      const data = record.data; // NEW!
      switch (record.recordType) {
        case "text":
          const textDecoder = new TextDecoder(record.encoding);
          console.log(`Text: ${textDecoder.decode(data)} - Language: ${record.lang}`);
          break;

        case "url":
          console.log(`URL: ${decoder.decode(data)}`);
          break;

        case "media":
          // Let developer handle media case
          if (record.mediaType === "application/json") {
            console.log(`JSON: ${JSON.parse(decoder.decode(data))}`);

          } else if (record.mediaType.startsWith("image/")) {
            const blob = new Blob([data], { type: record.mediaType });
            const img = document.createElement("img");
            img.src = URL.createObjectURL(blob);
            document.body.appendChild(img);

          } else {
            console.log(`Media not handled`);
          }
          break;

        case "android.com:pkg":
          console.log(`AAR Package Name: ${decoder.decode(data)}`);
          break;

        case "example.com:a":
          console.log(`My custom external type: ${data.getUint8(0)}`);
          break;
      }
    }
  };
}
kenchris commented 5 years ago

@beaufortfrancois Keep in mind that text is handled specially in NFC as it supports UTF-8 and UTF-16, so it is not so easy as to use a text decoder

kenchris commented 5 years ago

const decoder = new TextDecoder(); // UTF-8 by default

It CANNOT decode UTF-16, which may exist in NDEF tags - we actually don't allow writing UTF-16, but at least we allow reading it due to existing tags

We could of course modify that we return a array buffer, but it's kind of lying then

kenchris commented 5 years ago

Another suggestion, should we use data or go with payload which is closer to NDEF specs?

kenchris commented 5 years ago

Also how should we handle embedded records?

data: new NDEFEncoder().encode([{
    recordType: "local:a", // Local type
    data: Uint8Array.of(1)
 }]
interface NDEFDecoder {
   sequence<NDEFRecord> decode(arrayBuffer);
};

interface NDEFEncoder {
    [NewObject] ArrayBuffer encode(sequence<NDEFRecord>);
}
kenchris commented 5 years ago

Text is a problem as it actually encodes language and encoding (UTF-8 or -16). Decoding this manually is actually quite hard

const header = data.getUint8(0, false);
const langCodeLength = header >> 2;
const utf8Encoded = header & 0x1;
const lang = new TextDecoder.decode(data.buffer.slice(1,langCodeLength));
const text = new TextDecoder(utf8Encoded ? "utf-8" : "utf-16be").decode(data.buffer.slice(langCodeLength + 1, data.buffer.length);

On the web platform we cannot encode UTF-16 but we can easily decode it.

image

zolkis commented 5 years ago

Merge "json" and "opaque" recordType into new "media" recordType

They actually are both MIME type records. If you feel like json doesn't add much value, we can remove it. I can do that in the current open PR, i.e. now :)

We can name MIME type records opaque or mime.

I don't have a strong opinion on the data or payload and whether we use getters or DataView. I like the current getters, but if there is a better alternative, I am open to that.

beaufortfrancois commented 5 years ago

Thanks for all the feedback.

Also how should we handle embedded records?

I think we should keep it simple and have Javascript objects only for embedded records. In other words:

Edited: Add missing "JSON.stringify"

  const encoder = new TextEncoder();
  const jsonString = JSON.stringify({ key1: "value1", key2: "value2" });
  const records = [
    {
      recordType: "media",
      mediaType: "application/json",
      data: encoder.encode(jsonString)
    }
  ];
kenchris commented 5 years ago

The above doesn't seem right.. json isn't text

beaufortfrancois commented 5 years ago

We can name MIME type records opaque or mime.

If media doesn't fit, I'd prefer using mime then as it is familiar with NFC developers (MIME media-type) and apply directly to a record type.

I don't have a strong opinion on the data or payload and whether we use getters or DataView. I like the current getters, but if there is a better alternative, I am open to that.

I'm not a huge fan of payload though but doesn't feel strongly about it. Note that is is not common in the web platform: https://cs.chromium.org/search/?q=payload;+f:idl$+f:blink&sq=package:chromium&type=cs unlike data: https://cs.chromium.org/search/?q=data;+f:idl$+f:blink&sq=package:chromium&type=cs

kenchris commented 5 years ago

I was suggesting payload because that is what NDEF calls it, but I guess that data is fine if it is consistent.

I actually like media but I still think it is worth filing an issue on the TAG design principles so know what is preferred. - we can go with media for now

kenchris commented 5 years ago

We need to find out how to handle the special text record

const records = [
  {
    recordType: "text",
    lang: "en-US",
    encoding: "utf-8",
    data: encoder.encode("a text string");
  }
];

We should throw when encoding is different from utf-8 on other text based records. Also throw on the wrong kind of utf-16 (as there are two :) utf-16be and utf16-le)

case "text":
  const decoder = new TextDecoder(record.encoding);
  console.log(`Text: ${decoder.decode(record.data)}, language ${record.lang}`);
  break;
beaufortfrancois commented 5 years ago

We need to find out how to handle the special text record

  const records = [
    {
      recordType: "text",
      lang: "en-US",
      encoding: "utf-8",
      data: encoder.encode("a text string");
    }
  ];

How about making optional lang (default to user agent default locale) and encoding (default to utf-8) and keeping data a string for "text" recordType?

Note that this improvement is orthogonal to this issue, but still needed for "text" recordType as web developers needs to be able to set lang and encoding.

// Both would be equivalent for an english user.
const records = [
    {
      recordType: "text",
      lang: "en-US",
      encoding: "utf-8",
      data: "a text string",
    }
  ];
const records = [
    {
      recordType: "text",
      data: "a text string",
    }
  ];

For info, createTextRecord on Android uses UTF-8 while Core NFC (iOS) seems to use UTF-16 encoding.

We should throw when encoding is different from utf-8 on other text based records. Also throw on the wrong kind of utf-16 (as there are two :))

     case "text":
          console.log(`Text: ${new TextDecoder(record.encoding).decode(record.data)}, language ${record.lang}`);
          break;

Exposing those seems good to me. Thank you!

beaufortfrancois commented 5 years ago

I actually like media but I still think it is worth filing an issue on the TAG design principles so know what is preferred.

I've filed one at https://github.com/w3ctag/design-principles/issues/140

kenchris commented 5 years ago

How about making optional lang (default to user agent default locale) and encoding (default to utf-8) and keeping data a string for "text" recordType?

These are supposed to be optional. I am not sure what other APIs do about default... browser default locale? Could you check. Also maybe add these examples to your above API change suggestion

kenchris commented 5 years ago

I've filed one at w3ctag/design-principles#140

Thanks!

zolkis commented 5 years ago

I'd prefer media too, but I was instructed we should not use it. Thanks for filing that issue.

zolkis commented 5 years ago

How about making optional lang (default to user agent default locale) and encoding (default to utf-8) and keeping data a string for "text" recordType?

I have been thinking about the same. In the NDEF record, encoding and language code are in the first bytes of the PAYLOAD. So we can represent PAYLOAD split between lang, encoding and data. In that sense, I would prefer using data, since we will expose lang and encoding separately.

zolkis commented 5 years ago

A timid question. Wouldn't it be more clear if we had separate definitions for different record types? Like in the example below (to be improved, but you get the idea).

[Exposed=Window]
interface NDEFRecord {
  constructor(NDEFRecordInit recordInit);

  readonly attribute NDEFRecordType recordType;
  readonly attribute USVString mediaType;
  readonly attribute USVString id;
  readonly attribute DataView? data;
};

dictionary NDEFRecordInit {
  required NDEFRecordType recordType;
  USVString mediaType;
  USVString id;
  any data;
};

[Exposed=Window]
interface TextRecord: NDEFRecord {
  constructor(NDEFTextRecordInit recordInit);
  readonly attribute DOMString lang;
  readonly attribute DOMString encoding;
  USVString? text();  // for convenience
};
dictionary TextRecordInit: NDEFRecordInit {
  required NDEFRecordType recordType = "text";
  DOMString lang;
  DOMString encoding;
};

[Exposed=Window]
interface SmartPosterRecord: NDEFRecord {
  constructor(SmartPosterRecordInit init);
  sequence<NDEFRecord> toRecords();  // this could also be an attribute
};
dictionary SmartPosterRecordInit: NDEFRecordInit {
  required NDEFRecordType recordType = "smart-poster";
  sequence<NDEFRecord> records;
};

[Exposed=Window]
interface ExternalTypeRecord: NDEFRecord {
  constructor(ExternalTypeRecordInit init);
  sequence<NDEFRecord> toRecords();  // this should be a getter
  [NewObject] ArrayBuffer? arrayBuffer();
};
dictionary ExternalTypeRecordInit: NDEFRecordInit {
  required NDEFRecordType recordType = "external";
  sequence<NDEFRecord> records;  // nesting allowed; optional
};

// and so on for other types

OK, it's a lot to spec and implement.

beaufortfrancois commented 5 years ago

It is better in terms of spec work but it makes it harder to read I think.

kenchris commented 5 years ago

I don't it makes a big difference for the developer, and it is not common for people to get different types back and having to do type check (unless you know what is available, but then we are back to what we have today)

beaufortfrancois commented 5 years ago

How about making optional lang (default to user agent default locale) and encoding (default to utf-8) and keeping data a string for "text" recordType?

These are supposed to be optional. I am not sure what other APIs do about default... browser default locale? Could you check. Also maybe add these examples to your above API change suggestion

I didn't find any locale/language/lang default parameter. We may want to not add IDL default but simply spec that "language" will be browser user agent locale if not specified. WDYT?

I've updated my API change suggestion above.

kenchris commented 5 years ago

I found:

If the attribute value is the empty string (lang=""), the language is set to unknown; if the language tag is not valid according to BCP47, it is set to invalid.

I believe most NDEF APIs defaults to en-US but will have to check. Ah actually this is what Android does:

image

kenchris commented 5 years ago

encoding: "utf-16",

utf-16 is not explicitly enough, it has to be either utf-16be or utf-16le. Have to check what NFC supports.

The encoding spec seems to categorize utf-16 as utf-16be

image

kenchris commented 5 years ago

Actually the other way around (thanks to missing CSS styles) image

It actually depends on whether NFC stores the byte order mark (BOM) or not

image

beaufortfrancois commented 5 years ago

I found:

If the attribute value is the empty string (lang=""), the language is set to unknown; if the language tag is not valid according to BCP47, it is set to invalid.

I believe most NDEF APIs defaults to en-US but will have to check. Ah actually this is what Android does:

image

That's what I meant. There are no IDL defaults (e.g. lang = "en-uS"). What do you recommend?

beaufortfrancois commented 5 years ago

Actually the other way around (thanks to missing CSS styles) image

It actually depends on whether NFC stores the byte order mark (BOM) or not

image

Why not have "utf-16" as a valid encoding for NDEFRecordInit and let implementation figure out the appropriate one? When reading back though, record.encoding is "utf-16be" or "utf-16le" and matches TextDecoder. WDYT?

kenchris commented 5 years ago

I don't like much using browser language because it will surprise developers as they will most likely just test with their own locale. So en-US seems like a valid default, unless people declaratively (in HTML) etc set the language.

document language:

<html lang="en">
...
</html>
document.documentElement.lang

This differs from UA (browser) language

navigator.language
beaufortfrancois commented 5 years ago

Let's pick document.documentElement.lang. I've updated API changes suggestion.

kenchris commented 5 years ago

Regarding UTF-16 in NDEF:

The Unicode BOM (byte order mark) MUST be tolerated (ie. no error conditions).

That is what TextDecoder does, but maybe we need to detect then - if it has BOM its "utf-16".

When writing a text record the BOM MAY be omitted, in that case the order shall be big endian ("utf-16be")

Basically, we allow people to write "utf-16" and "utf-16be" but they are on their own on how they do that as TextEncoder won't help writing :-) (libraries do though)

When reading and detecting UTF-16 we probably need to check whether it has a BOM or not.

image

or maybe it is fine to just say "utf-16" in that case, as you can decode with

 TextDecoder("utf-16be", { ignoreBOM: false })

and it should just work. Good example though

zolkis commented 5 years ago

It is better in terms of spec work but it makes it harder to read I think.

I don't think it makes a big difference for the developer, and it is not common for people to get different types back and having to do type check

Right now we need to expose a union of properties and getters across all record types. That is confusing as well: for instance Generic Sensors didn't dump all possible properties into a single object for a reason.

Even now scripts need to check for recordType, there is no change there. But we'd get the self-documenting clarity on what getters are supposed to work etc, and record creation would become better defined as well.

Anyway it would be a big change.

kenchris commented 5 years ago

@zolkis this is very few properties we are talking about, just expose them everywhere and make them null when then are irrelevant

kenchris commented 5 years ago

OK I think I understand the UTF-16 more or less now. NFC only writes in big endian. When you write the BOM you always write 0xFEFF and that gets swapped if the decoder and encoder have different endianess.

So this works fine on my little endian machine:

image

Guess that means that we should always use "utf-16be" and write that as the encoding when we detect utf-16.

kenchris commented 5 years ago

Trying to explain endianess for the spec:

The byte order of NDEF is big endian, which means that everything is read back as big endian. For UTF-16, byte order matters as it might differ between reading and writing. For this reason UTF-16 usually has a byte order mark (BOM) which is 0xFEFF, meaning that if the byte order differs between host machine and NDEF, then the value will be read back as 0xFFFE indicating that the byte order should be swapped. In the case that no BOM is present, UTF-16BE (big endian) encoding should be assumed.