kewisch / ical.js

Javascript parser for ics (rfc5545) and vcard (rfc6350) data
https://kewisch.github.io/ical.js/
Mozilla Public License 2.0
989 stars 137 forks source link

VCard 4.0 text encoding differs from ical and vCard 3.0 #173

Open amrod- opened 9 years ago

amrod- commented 9 years ago

VCard 4.0 differs from vCard 3.0 and ical in how text for values are encoded. Mainly the semicolon is not encoded. Another differense is that rfc6350 section 3.4 states that all values must encode comma, backslash and newline characters. This is not the case for vCard 3.0 and ical.

Currently in ical.js working with vCard 4.0 text values are incorrectly encoded, URI and unknown values don't get encoded at all.

Fixing this for vCard 4.0 is not as trivial as overwriting text for vCard and copying this to URI and the unknown value. There are components in vCard 4.0 that do encode semicolon characters for example ADR and N.

kewisch commented 9 years ago

Yes, I saw some potential flaws while implementing it, what we can do is have a separate text type for vcard if the escaping is different than for iCalendar. Regarding vcard 4.0 vs 3.0, I think I might prefer a separate converter that converts 3.0 to 4.0 instead of special casing everywhere in the parser.

Regarding the "unknown" value type, this value is not encoded on purpose, it is meant to be used "as-is". See rfc7095 for details. Indeed the uri value is not treated differently.

For the "text" value type, if commas are not escaped with backslashes that sounds like a bug. For newline characters, see also rfc6868.

tadzik commented 4 years ago

The problem with escaped colon parsing also means that the vcards grow when roundtripped:

const ICAL = require('ical.js');

let vcard = `BEGIN:VCARD
VERSION:3.0
FN:John Doe
URL;type=pref:http\\://www.example.com/doe
END:VCARD`;

for (let i = 0; i < 5; i++) {
    vcard = ICAL.Component.fromString(vcard).toString();
}

console.log(vcard);

gives us:

BEGIN:VCARD
VERSION:3.0
FN:John Doe
URL;TYPE=pref:http\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\://www.example.com/doe
END:VCARD

If the vcard is in version 4.0 then the problem goes away – but it's still a less-than-ideal state of things when handling 3.0's, even incorrectly generated. I can't think of any viable workaround though, other than “deduplicate backslashes when serializing because the user probably doesn't mean to have that many”. Any better ideas?

Having re-read the spec (and the original issue) it seems like it's actually more of a problem with escaping backslashes specifically than anything spec-related: but I'll leave it here seems it seems to somewhat relevant – the bug does only appear on vcard3 after all.