markov2 / perl5-Mail-Message

Processing MIME messages
http://perl.overmeer.net/CPAN
1 stars 1 forks source link

Mail::Message::Field::Addresses randomises address order #15

Closed geoffreybennett closed 1 year ago

geoffreybennett commented 1 year ago

I'm parsing an email and want to get a list of names and addresses (with correct parsing of encodings), so I'm doing something like:

@addrs = $msg->study('cc')->addresses;

But I found that the addresses method returns the addresses in a random order each time. Mail::Message::Field doesn't show this behaviour, but Mail::Message::Field::Full does, because Mail::Message::Field::AddrGroup uses a hash which randomises the address order:

$ perl -d -e 0
...
  DB<1042> use Mail::Message::Field::Addresses

  DB<1043> $to = 'To: a@a, b@a, c@a, d@a, e@a, f@a';

  DB<1044> $class = "Mail::Message::Field";

  DB<1045> for (1 .. 5) {print join(" ", map { $_->address } $class->new($to)->addresses), "\n"; }
a@a b@a c@a d@a e@a f@a
a@a b@a c@a d@a e@a f@a
a@a b@a c@a d@a e@a f@a
a@a b@a c@a d@a e@a f@a
a@a b@a c@a d@a e@a f@a

  DB<1046> $class = "Mail::Message::Field::Full";

  DB<1047> for (1 .. 5) {print join(" ", map { $_->address } $class->new($to)->addresses), "\n"; }
f@a a@a b@a d@a e@a c@a
d@a b@a a@a c@a e@a f@a
b@a a@a d@a e@a c@a f@a
e@a c@a a@a b@a d@a f@a
f@a e@a c@a b@a a@a d@a

I think this is a bug? The order of addresses shouldn't be randomised when full parsing is enabled.

geoffreybennett commented 1 year ago

I found that Mail::Message::Field::AddrGroup uses User::Identity::Collection which is where the hash is. Thoughts on changing UIC to use an array instead of (or addition to?) a hash for the collection?

geoffreybennett commented 1 year ago

Along the lines of this (not complete, obviously, but enough to make the test case above work right)...

diff --git a/lib/User/Identity/Collection.pm b/lib/User/Identity/Collection.pm
index ca800e4..6a7bffa 100644
--- a/lib/User/Identity/Collection.pm
+++ b/lib/User/Identity/Collection.pm
@@ -115,7 +115,7 @@ sub init($)
     defined($self->SUPER::init($args)) or return;

     $self->{UIC_itype} = delete $args->{item_type} or die;
-    $self->{UIC_roles} = { };
+    $self->{UIC_roles} = [ ];
     my $roles = $args->{roles};

     my @roles
@@ -136,7 +136,7 @@ Returns all defined roles within this collection.  Be warned: the rules
 are returned in random (hash) order.
 =cut

-sub roles() { values %{shift->{UIC_roles}} }
+sub roles() { @{shift->{UIC_roles}} }

 =method itemType 
 Returns the type of the items collected.
@@ -198,7 +198,7 @@ sub addRole(@)
     }

     $role->parent($self);
-    $self->{UIC_roles}{$role->name} = $role;
+    push @{ $self->{UIC_roles} }, $role;
     $role;
 }
markov2 commented 1 year ago

I have been thinking about your report, but did not respond yet. Of course, the HASH is used to provide rapid access. However, the HASH probably stays small so an ARRAY search is not too much slower. There are a few more methods which require changes.

Order is a header field is not significant... but random order is inconvenient for regression tests. Are there more reasons?

Another solution is tieing $self->{UIC_roles} to an Tie::IxHash

geoffreybennett commented 1 year ago

I'm displaying emails to users & it's weird looking at consecutive emails which were sent to the same set of multiple people when the list is in a different order in each email. Also, people often put the most-important recipients first in to/cc lists so simply sorting the list isn't a great idea.

I tried your Tie::IxHash idea. That's really easy! This seems to fix the problem for me:

diff --git a/lib/User/Identity/Collection.pm b/lib/User/Identity/Collection.pm
index ca800e4..a873365 100644
--- a/lib/User/Identity/Collection.pm
+++ b/lib/User/Identity/Collection.pm
@@ -11,6 +11,7 @@ use warnings;
 use User::Identity;
 use Carp;
 use List::Util   qw/first/;
+use Tie::IxHash;

 =chapter NAME

@@ -115,7 +116,7 @@ sub init($)
     defined($self->SUPER::init($args)) or return;

     $self->{UIC_itype} = delete $args->{item_type} or die;
-    $self->{UIC_roles} = { };
+    tie %{ $self->{UIC_roles} }, 'Tie::IxHash';
     my $roles = $args->{roles};

     my @roles
markov2 commented 1 year ago

Based on https://metacpan.org/dist/Hash-Ordered/view/lib/Hash/Ordered/Benchmarks.pod (and my acquaintance with that author) I picked Hash::Ordered.

Look for User-Identity 1.02, which will arrive on your CPAN mirror soon.

jixam commented 1 year ago

User-Identity 1.02 has broken serialization for us, is this an unsupported usage?

$ cat storable.pl
use Mail::Message::Field::Address;
use Storable;

my $a = Mail::Message::Field::Addresses->new(name => "");
$a->parse("test\@example.org");
Storable::freeze($a->addresses);
$ cpanm -q User::Identity@1.01
Successfully installed User-Identity-1.01 (downgraded from 1.02)
1 distribution installed
$ perl storable.pl
$
$ cpanm -q User::Identity@1.02
Successfully installed User-Identity-1.02 (upgraded from 1.01)
1 distribution installed
$ perl storable.pl
Can't store CODE items at /usr/local/lib/perl5/5.36.0/x86_64-linux-gnu/Storable.pm line 370, at storable.pl line 6.
$
markov2 commented 1 year ago

Well, your use of Storable does break the OO interface which my module is offering. It does expect that the internals of the objects do not change over time. Which I certainly do not guarantee without knowing that anyone depends on it.

The issue which was to solve was the unpredictable of the addresses after parsing. They are currently stored in a HASH, which I tied to a Tie::IxHASH to solve the requested order. So: this tie is where you bump into. The other solution which I had in mind, was changing the hash into an ARRAY. That would also certainly have broken your implementation.

The docs of Storable tell me that it hash support for ties. Are you running a very old version of Storable, or did I break something else? Because the warning "CODE" is not "TIE".

jixam commented 1 year ago

The Storable version is current (3.26). We only store things for a few minutes so compatibility between versions is not an issue for us. If this is unsupported, fair enough and I will look into fixing it at our end.

Thank you!

markov2 commented 1 year ago

When the man-page says "tie" is supported, but you get this error, it is worth to study the exact cause and solution. I'll look into it.

markov2 commented 1 year ago

I can solve the issue for you by replacing Hash::Ordered in Tie::IxHash in User/Identity/Collection.pm. Apparently, Hash::Ordered uses coderefs. It is the faster solution.

Still: although it might not hurt you at the moment, it's hard to defend going around the offered methods. I would take the performance penalty by either reparse, or cleanly collecting the addresses before storage.

jixam commented 1 year ago

Right, I will find a way around it; no problem. Thanks for looking into it.