mangstadt / ez-vcard

A vCard parser library for Java
Other
399 stars 92 forks source link

Erroneous character encoding when line is wrapped according to `lineLength` #80

Closed alexander-myltsev closed 7 years ago

alexander-myltsev commented 7 years ago

Characters are encoded by two bytes In Russian: =D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=B0 for Тамара. Your writer for some reason might wrap the line in the middle of encoded string: =D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0\n=B0. The last string won't be decoded by Google Contacts importer, for example. It would be Тамар� � instead.

mangstadt commented 7 years ago

Thanks for the post. Have you actually tried importing such a vCard into Google Contacts or are you just speculating that this might happen?

You are referring to line folding. It is part of the vCard syntax. It allows you to split a long line into multiple lines without loosing any data.

For example, the two vCards below are identical.

BEGIN:VCARD
VERSION:2.1
FN;QUOTED-PRINTABLE:=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0
 =B0
END:VCARD

BEGIN:VCARD
VERSION:2.1
FN;QUOTED-PRINTABLE:=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=B0
END:VCARD

Line folding is optional. You can turn it off like so:

VCardWriter writer = ...
writer.getVObjectWriter().getFoldedLineWriter().setLineLength(null);
alexander-myltsev commented 7 years ago

I did tried to import as-is. Alas, those two cards are not equal for Google Contacts. If you try to decode the first card at http://www.webatic.com/run/convert/qp.php, you would get similar error.

I the end I actually called setLineLength(null), and Google imported everything correctly. But for that I needed to download source code and change access modifiers (some privates to publics). Did you check that line length can be set in a project that uses ez-vcard as dependency? I suggest you to add it to examples.

mangstadt commented 7 years ago

I did tried to import as-is. Alas, those two cards are not equal for Google Contacts.

Are you sure? Did you include the space character on the fourth line of the first vCard?

But for that I needed to download source code and change access modifiers (some privates to publics).

You must have done something wrong. All those methods are public. :-P

alexander-myltsev commented 7 years ago

I did tried to import as-is. Alas, those two cards are not equal for Google Contacts.

Are you sure? Did you include the space character on the fourth line of the first vCard?

I am. This is original card:

BEGIN:VCARD
VERSION:2.1
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20.01.1949 =D0=B3.=D1=80.;*=D0=90=D0=B2=D0=B8=D0=B0- =D0=9A=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=B0/
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:*=D0=90=D0=B2=D0=B8=D0=B0- =D0=9A=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=B0/ 20.01.1949 
TEL;CELL; MOBILE; VOICE:+71234567890
REV:20170219T151325
UID;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20140611T062842Z-000000000000309-D742759A-03A7-0053-3532-3530935A0D@samsungmobile.com
END:VCARD

This is converted card:

BEGIN:VCARD
VERSION:4.0
PRODID:ez-vcard 0.10.3-SNAPSHOT
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20.01.1949 =D0=B3.=D1=80.;*=D0=90=
 =D0=B2=D0=B8=D0=B0- =D0=9A=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=
 =D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=
 =B0=D0=BC=D0=B0=D1=80=D0=B0/;;;
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:*=D0=90=D0=B2=D0=B8=D0=B0- =D0=9A=
 =D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=
 =D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=
 =B0/ 20.01.1949 
TEL;TYPE=CELL,MOBILE,VOICE:+71234567890
REV:20170219T121325Z
UID;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20140611T062842Z-0000000000003=
 09-D742759A-03A7-0053-3532-3530935A0D@samsungmobile.com
END:VCARD

This is the Google imported one:

image

The code that produces it:

        String path = "test.vcf";
        File file = new File(path), fileOut = new File("test_converted.vcf");
        Ezvcard.write(Ezvcard.parse(file).all()).version(VCardVersion.V4_0).go(fileOut);

On your piece of code:

VCardWriter writer = ...
writer.getVObjectWriter().getFoldedLineWriter().setLineLength(null);

I know VCardWriter could be modified in that way. But how to pass it to Ezvcard chain? What is the method of ChainingTextWriter to get the VCardWriter? I could pass it to go, but such overload is private. Makes sense?

mangstadt commented 7 years ago

I know VCardWriter could be modified in that way. But how to pass it to Ezvcard chain?

The Ezvcard chain does not have a method that sets line length. Maybe I should add one... Right now, you'd have to use the VCardWriter class if you wanted to disable line folding.

Try outputting to version 2.1 and uploading that to Google. Only version 2.1 is supposed to support quoted printable text.

Ezvcard.write(Ezvcard.parse(file).all()).version(VCardVersion.V2_1).go(fileOut);

I have one more idea to try if that doesn't work.

alexander-myltsev commented 7 years ago

Same result on google for card text:

BEGIN:VCARD
VERSION:2.1
X-PRODID:ez-vcard 0.10.3-SNAPSHOT
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20.01.1949 =D0=B3.=D1=80.;*=D0=90=
 =D0=B2=D0=B8=D0=B0- =D0=9A=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=
 =D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=
 =B0=D0=BC=D0=B0=D1=80=D0=B0/
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:*=D0=90=D0=B2=D0=B8=D0=B0- =D0=9A=
 =D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=
 =D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=
 =B0/ 20.01.1949
TEL;TYPE=CELL;TYPE=MOBILE;TYPE=VOICE:+71234567890
REV:20170219T121325Z
UID;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20140611T062842Z-0000000000003=
 09-D742759A-03A7-0053-3532-3530935A0D@samsungmobile.com
END:VCARD

Moreover Google doesn't import phone number from V2_1. That was the reason I came to your project to convert it to V4_0.

You should also try more wise line wrapping. In that case don't split =D0=B0.

mangstadt commented 7 years ago

Can you test something? Can you put this vCard (below) into Google and tell me if it works? I think it might work. This vCard does not have the spaces at the beginning of the lines:

BEGIN:VCARD
VERSION:4.0
PRODID:ez-vcard 0.10.3-SNAPSHOT
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20.01.1949 =D0=B3.=D1=80.;*=D0=90=
=D0=B2=D0=B8=D0=B0- =D0=9A=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=
=D0=BB=D0=B0=D0=B4=D0=B8=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=
=B0=D0=BC=D0=B0=D1=80=D0=B0/;;;
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:*=D0=90=D0=B2=D0=B8=D0=B0- =D0=9A=
=D0=B0=D0=BF=D1=83=D1=81=D1=82=D0=B8=D0=BD =D0=92=D0=BB=D0=B0=D0=B4=D0=B8=
=D0=BC=D0=B8=D1=80 =D0=92=D0=BB./=D0=B6.=D0=A2=D0=B0=D0=BC=D0=B0=D1=80=D0=
=B0/ 20.01.1949 
TEL;TYPE=CELL,MOBILE,VOICE:+71234567890
REV:20170219T121325Z
UID;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:20140611T062842Z-0000000000003=
 09-D742759A-03A7-0053-3532-3530935A0D@samsungmobile.com
END:VCARD

Thanks for doing the testing.

alexander-myltsev commented 7 years ago

VCard you gave works fine:

image

mangstadt commented 7 years ago

Ok, should be fixed now (see: 8f4bab8df68439dae953ab11cb2d79fba89bbd0e).

I also added a foldLines() method to the chaining API (see: f0e68fb31ce1f12fe1661c5880b83bdc305ce2f6).

A new stable version has been released containing this fix (0.10.3). Please give it a try and let me know if it works. It should go live within a few hours. Thanks for your help.

alexander-myltsev commented 7 years ago

Everything works for me now. Thanks! This one can be closed.

mangstadt commented 7 years ago

Excellent. Thanks for your help.