worthmine / Text-vCard-Precisely

Read, Write or Edit vCard 3.0 or 4.0 with perl
https://metacpan.org/pod/Text::vCard::Precisely
Other
3 stars 3 forks source link

"\n" processing missing / not RFC compliant for ADR and NOTE #23

Closed michaelof closed 4 years ago

michaelof commented 4 years ago

Dear Yuki,

I'm getting errors when importing vcards with newlines. RFC 6350 says "Backslashes, commas, and newlines must be encoded", and it looks like the Text-vCard-Precisely vcards are breaking this, e.g.:

`#!/usr/bin/perl use strict; use warnings;

use Text::vCard::Precisely::V4;

my $vc = Text::vCard::Precisely::V4->new();

$vc->fn( "Forrest Gump" );

my $adrFormatted = "Waters Edge\nBaytown 30314"; $adrFormatted =~ s/\n/\n/g;
$vc->adr( [ {
types => ['HOME'], street => $adrFormatted, label => $adrFormatted } ]);

$vc->note( "I'm\na\nmultiline\nnote");

print $vc->as_file("forrestgump.vcf"); `

result is:

BEGIN:VCARD VERSION:4.0 FN:Forrest Gump ADR;TYPE=HOME;LABEL="Waters Edge\nBaytown 30314":;;Waters Edge\nBaytown 30314;;;; NOTE:I'm a multiline note END:VCARD

As you see, LABEL has correctly masked \n, STREET and NOTE not. Tested with double backslash, helped only for LABEL

worthmine commented 4 years ago

Thank you for reporting. The users who are well familiar with RFC, like you are always welcome!

escaping is available but Certainly, not supporting newlines.

I will fix it ASAP.

worthmine commented 4 years ago

version 0.28 has fixed this issue.

michaelof commented 4 years ago

Last question for this: Any ideas/plans when 0.28 will arrive in CPAN?

Says Text::vCard::Precisely is up to date (0.27). on my machine.

worthmine commented 4 years ago

just run as: $ cpanm Text::vCard::Precisely~0.28

michaelof commented 4 years ago

Checked with 0.28. \n is now working for address content and note, cool.

But NOT for address LABEL, this example shows:

`#!/usr/bin/perl use strict; use warnings;

use Text::vCard::Precisely::V4;

my $vc = Text::vCard::Precisely::V4->new();

$vc->fn( "Forrest Gump" );

my $adrFormatted = "Waters Edge\nBaytown 30314"; $vc->adr( [ { types => ['HOME'], street => $adrFormatted, label => $adrFormatted } ]);

print $vc->as_file("forrestgump.vcf"); `

worthmine commented 4 years ago

Thank you for reporting. My tests were wrong.

I've fixed that. and you can try it with this command:

$ cpanm git@github.com:worthmine/Text-vCard-Precisely.git

the version is still 0.28, but code is the latest.

And you may close it when you satisfy these fixes;

michaelof commented 4 years ago

Just tried to test, but got an error:

cpanm git@github.com:worthmine/Text-vCard-Precisely.git Cloning git@github.com:worthmine/Text-vCard-Precisely.git ... FAIL ! Failed cloning git repository git@github.com:worthmine/Text-vCard-Precisely.git ! Couldn't find module or a distribution git@github.com:worthmine/Text-vCard-Precisely.git

michaelof commented 4 years ago

git clone etc. works with cpanm git://github.com/worthmine/Text-vCard-Precisely.git

worthmine commented 4 years ago

i'm sorry that I told you wrong path. Did you solve it? because it is public repository, you can get it with https like: $ cpanm https://github.com/worthmine/Text-vCard-Precisely.git

michaelof commented 4 years ago

I'm confused :-) Your patch for #23 itself works, good news, thank you very much!

But, whyever, the UTF-8 issue #22 came back, somehow.

My script, which produced a UTF-8 encoded VCF in 0.27 and 0.28 (cpan version) now produces a WIN-1252 VCF file, without any change made regarding encoding. Means that e.g. Nextcloud imports them, but German umlauts (äöüÄÖÜß) are corrupted.

Tried to narrow down, as I found out that "as_file" and "as_string" results differ:

`#!/usr/bin/perl use strict; use warnings;

use Text::vCard::Precisely::V4; my $vc = Text::vCard::Precisely::V4->new();

my $fn = "Först Last"; printf '%vX', $fn; print "\n"; $vc->fn( $fn );

$vc->as_file("22_af.vcf"); my $str = $vc->as_string();

open my $out, '>', '22_as.vcf' or die; print $out $str; close $out;`

--> as_file file "22_af.vcf" is correctly encoded UTF-8 --> as_string writtten file "22_as.vcf" contains WIN-1252

FYI absolutely no idea why WIN-1252, everything runs on an Linux system.

Remark: I've started with Text::vCard::Precisely::V4, knowing that I'm dealing with lots of vcard. That's why I've used as_string and printed each vcard into a combined addressbook file. Thought at first of course to use your Text::vCard::Precisely::Multiple, but I have no clou how to "pass" a vcard built with Text::vCard::Precisely::V4 to Text::vCard::Precisely::Multiple, whatever I've tried with load_arrayref failed. Got the hint from IRC channel #perl "load_arrayref is expecting an arrayref of hashrefs, not an arrayref of objects", and that maybe " $vcm->add_option($vcard)" might work, although "add_option" not documented as interface. Tried, but failed also: Attribute (options) does not pass the type constraint because: Validation failed for 'vCards' with value ARRAY(0x557235e917d0) at native delegation method Text::vCard::Precisely::Multiple::add_option (push) of attribute options (defined at /home/michael/perl5/lib/perl5/Text/vCard/Precisely/Multiple.pm line 25) line 9 Text::vCard::Precisely::Multiple::add_option('Text::vCard::Precisely::Multiple=HASH(0x557230d395a0)', 'Text::vCard::Precisely::V4=HASH(0x55722f442428)') called at .... Any hints how to use Text::vCard::Precisely::Multiple and Text::vCard::Precisely together, if possible, would be cool.

worthmine commented 4 years ago

I didn't have finished to write tests for that yet. give me a moment maybe a week.

worthmine commented 4 years ago

I've just remembered that🤨: This module treats strings as 'UTF-8’ with no decoding unless $self->encoding_in/out() was set for any other. And that is NOT recommended by a few reasons so I didn't write that in POD.

the new Issue you've reported is considered that your code is written in WIN-1252, not UTF-8.

Please confirm the encoding of your code.

And for the second half of your report, I will try to fix.

worthmine commented 4 years ago

the fastest way to pass in current version is here:

use strict;
use warnings;
use Encode qw(encode_utf8);

use Text::vCard::Precisely;    # you can't use Text::vCard::Precisely::V4 yet
my $vc = Text::vCard::Precisely->new( version => '4.0' );
my $fn = "Först Last";
printf '%vX', $fn;
print "\n";
$vc->fn($fn);

$vc->as_file("22_af.vcf");
my $str = $vc->as_string();

open my $out, '>', '22_as.vcf' or die $!;
print $out $str;
close $out;

use Text::vCard::Precisely::Multiple;
my $vcm = Text::vCard::Precisely::Multiple->new( version => '4.0' );
$vcm->add_option($vc);                           # $vc must be Text::vCard::Precisely, not ::V3 or ::V4
$vcm->add_option($_) for $vcm->all_options();    # it accepts the content of $vcm
$vcm->add_option( $vcm->all_options() );         # or all of itself

print encode_utf8 $vcm->as_string();

Of course, I will upgrade this that can be treated as More intuitively;

michaelof commented 4 years ago

the new Issue you've reported is considered that your code is written in WIN-1252, not UTF-8.

Please confirm the encoding of your code.

I've a use utf8; set for my script, and my editor (jEdit) confirms that the perl file itself is UTF-8. Which is IMHO reasonable as I'm working on a LINUX system (OpenSuse Leap 15.2)

michaelof commented 4 years ago

GOOD news:

use Text::vCard::Precisely::Multiple;
my $vcm = Text::vCard::Precisely::Multiple->new( version => '4.0' );
$vcm->add_option($vc);                           # $vc must be Text::vCard::Precisely, not ::V3 or ::V4
$vcm->as_file("myvcards.vcf");

--> SOLVES my issue with WIN-1252, as myvcards.vcf written by Text::vCard::Precisely::Multiple is now UTF-8.

THANKS for the clarification about "...::V4"

(and WOW, just learned by digging into your comment about the ```perl github edtor feature :-))

worthmine commented 4 years ago

ha,ha. you should be a master of Markdown ASAP.

At your code in the reply, there is no use utf8; so I doubted the encoding was not 'UTF-8', but it was.

btw, I don't know why but is it fixed without my updating?

michaelof commented 4 years ago

At your code in the reply, there is no use utf8; so I doubted the encoding was not 'UTF-8', but it was.

Yes, I've tried with several versions of the sample script, with use utf8, without, with print encode_utf8, without.

btw, I don't know why but is it fixed without my updating?

Well, Text::vCard::Precisely's "as_file" always produced a UTF-8 vcard. As now, after your great explanation, I'm able to use Text::vCard::Precisely::Multiple and it's "as_file", I'm getting an UTF-8 vcard file by Text::vCard::Precisely::Multiple also.

But when I try to use "as_string" and print myself to a file, this file is WIN-1252, whyever, no idea. Even in your modified example https://github.com/worthmine/Text-vCard-Precisely/issues/23#issuecomment-717651662 file "22_as.vcf" is shown as "DOS/WINDOWS (\r\n)" and "windows-1252" by jEdit, whereas the "as_file" vcards are shown as "DOS/WINDOWS (\r\n)" and "UTF-8".

Both files are showing "correct" German umlauts äöüÄÖÜß. but when importing the win1252 vcard to Nextcloud, as Nextcloud expects a vcard to be utf-8, I'm getting "Mojibake" (new word for me, got teached by you)

worthmine commented 4 years ago

'Mojibake' means as what you expressed 'corrupted'.

okay, I'll take a look as_string() until next upgrading.

worthmine commented 4 years ago

I have to add tests for those Issue #23 #24 #25 until next release, but have updated the codes to be reviewed by you.

michaelof commented 4 years ago

tested specifically \n handling for

V3: ADR and NOTE V4: ADR (with LABEL) and NOTE

All working fine, IMHO issue could be closed.

THANK YOU!!!