ranguard / text-vcard

Perl package to edit and create vCard(s) (RFC 2426)
22 stars 15 forks source link

Hardcoded "\x0D\x0A" fails for parsing vCard on Windows #20

Open tmhall opened 10 years ago

tmhall commented 10 years ago

I was attempt to help a user on SO: http://stackoverflow.com/q/22790520/1733163

However, I was unable to install this module and it's dependencies without errors on Strawberry Perl 5.18.2. In the end I forced install in order to debug the module.

I traced one issue to Line 315 of Text::vCard::Addressbook my @lines = split "\x0D\x0A", $text;

The above will file to split a file on Windows because \x0D is stripped by perlio in :crlf mode. The only way to bypass this issue is to open import the file in binmode and pass it to he constructor as a string. The following straights:

use strict;
use warnings;
use utf8;

use vCard;

# create the object
my $vcard = vCard->new;

binmode(\*DATA);
my $string = do { local $/; <DATA> };

$vcard->load_string($string);

print $vcard->as_string();

__DATA__
BEGIN:VCARD
VERSION:3.0
N;LANGUAGE=de:name;first
FN:first name
ORG:company
TEL;TYPE=work:+49 (0000) 123456
ADR;TYPE=work:;;Strasse 1;Ortschaft arbeit;;56789;Deutschland
ADR;TYPE=home;PREF:;;Strasse 2;Ortschaft 4;;12345;Deutschland
EMAIL:info@a.de
URL:www.a.de
END:VCARD

Without binmode, the output is:

BEGIN VCARD without matching END at C:/strawberry/perl/site/lib/Text/vFile/asData.pm line 137,

With binmode, the output is:

BEGIN:VCARD
VERSION:4.0
END:VCARD

Obviously there are still more issues to resolve, but I at least wanted to submit this. If I have time to follow up with this issue, or even create a full patch, I will submit those later.

kablamo commented 10 years ago

Hi @tmhall! Thanks for reporting this. It makes me aware of a few issues.

The vcard RFC requires every line to end with a \r\n. We should have a more helpful error if the string is not properly formatted. Do you still have a problem if you use $vcard->load_file()?

kablamo commented 10 years ago

Also load_string() does not do what the documentation says it does. Here is code that should behave the way you want:

my $new_vcard = $vcard->load_string($string);
$new_vcard->as_string();

This is not intuitive behavior and someone should look into fixing that.

kablamo commented 10 years ago

Ok, I submitted 2 pull requests which should address a few of the issues here. I still need to create a better error message for strings that are missing the \r\n pattern. But I'm heading off to sleep now.

ranguard commented 10 years ago

@kablamo boom: http://www.cpantesters.org/cpan/report/6e1d7369-6c10-1014-b8e4-5db800d6f3f7

ranguard commented 10 years ago

@kablamo less boom, but still boom http://www.cpantesters.org/cpan/report/1d3fc6ca-6bfa-1014-9538-937384992293

ranguard commented 10 years ago

@kablamo different boom! http://www.cpantesters.org/cpan/report/d9bfc4e4-7489-1014-b7e9-e9c63567e6ce

ranguard commented 10 years ago

@kablamo Getting there! - but now UTF8 issue for Windoz: http://www.cpantesters.org/cpan/report/33e1ea25-6bfa-1014-afed-754eb9e98ec4

mivk commented 4 years ago

The hardcoded CRLF problem is not limited to Windows. I just encountered it with vcards downloaded from an official phone directory.

I would suggest to follow Jon Postel's "Robustness Principle" from RFC 761:

be conservative in what you do, be liberal in what you accept

and accept both CRLF and LF line endings.

My quick fix was this, in case someone finds it useful:

$ diff -u /usr/share/perl5/Text/vFile/asData.pm.orig /usr/share/perl5/Text/vFile/asData.pm
--- /usr/share/perl5/Text/vFile/asData.pm.orig  2013-01-28 13:32:13.000000000 +0100
+++ /usr/share/perl5/Text/vFile/asData.pm   2020-09-30 16:15:51.850221688 +0200
@@ -27,9 +27,9 @@
 sub _unwrap_lines {
     my $self = shift;
     my @lines;
-    for (@_) {
+    for (map {split /\R/} @_) {
         my $line = $_; # $_ may be readonly
-        $line =~ s{[\r\n]+$}{}; # lines SHOULD end CRLF
+        $line =~ s{\R+$}{}; # lines SHOULD end CRLF (but let's not be so strict)
         if ($line =~ /^[ \t](.*)/) { # Continuation line (RFC Sect. 4.1)
             die "Continuation line, but no preceding line" unless @lines;
             $lines[-1] .= $1;