ranguard / text-vcard

Perl package to edit and create vCard(s) (RFC 2426)
22 stars 15 forks source link

as_string() garbles wide characters at line breaks on Unix/OS X #37

Open grigutis opened 8 years ago

grigutis commented 8 years ago

When using multibyte UTF-8 characters in a NOTE node, characters are garbled/lost around the newline. I think specifying a newline as raw bytes (\x0D\x0A) is causing the problem. Shouldn't it just be \n an let perl figure it out based on the OS?

Example code:

#!/usr/bin/perl -w
use strict;
use utf8;
use v5.10;
use Text::vCard::Addressbook;
use diagnostics;

binmode STDOUT, ":utf8";

# create an address book
my $address_book = Text::vCard::Addressbook->new();

# single byte example
my $vcard = $address_book->add_vcard();
$vcard->version('3.0');
$vcard->FN('Test User');
$vcard->NOTE('12345678901234567890123456789012345678901234567890123456789012345678901234567890');
say $vcard->as_string();

# multi byte example
$vcard = $address_book->add_vcard();
$vcard->version('3.0');
$vcard->FN('Tèśt Ûšér');
$vcard->NOTE('①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳');
say $vcard->as_string();

open(my $out_fh, '>:encoding(UTF-8)', 'example.vcf') or die "Could not write file ($!)";
print $out_fh $address_book->export();

On my Mac, example.vcf comes out as:

BEGIN:VCARD
VERSION:3.0
FN:Test User
NOTE:1234567890123456789012345678901234567890123456789012345678901234567890
 1234567890
END:VCARD
BEGIN:VCARD
VERSION:3.0
FN:Tèśt Ûšér
NOTE:①②③④⑤⑥⑦⑧⑨⑩⑪��
 �⑬⑭⑮⑯⑰⑱⑲⑳①②③④
 ⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯�
 ��⑱⑲⑳①②③④⑤⑥⑦⑧��
 �⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳①
 ②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬�
 ��⑮⑯⑰⑱⑲⑳
END:VCARD
grigutis commented 8 years ago

After a little more research, it looks like the culprit is Text::vCard::Node->_wrap(). It wraps the lines at a set number of octets regardless if the characters are multibyte. But the spec says: "Multi-octet characters MUST remain contiguous."

ranguard commented 8 years ago

Thanks @grigutis I'd be happy to take a pull request with some tests