lkosson / full-address-column

Thunderbird add-on to show full sender and recipient address column in message list
MIT License
39 stars 11 forks source link

Mail address string detection can be improved #4

Open juliomaranhao opened 3 years ago

juliomaranhao commented 3 years ago

Sender (@) and Recipient (@) mail address string detection can be improved. I see three kind of errors:

Raw header is To: "AA Aaaa, Company" <aaaaaaaa@company.com> Recipient (@) is "AA Aaaa, aaaaaaaa@company.com. Maybe something related to quotes and commas?

Raw header is From: MAILER-DAEMON@mail4.cajapsi (Mail Delivery System) Sender (@) is the same from above. No hint why the address string detection includes (Mail Delivery System).

Raw header is From: =?utf-8?q?=22Linha_de_comunica=C3=A7=C3=A3o_de_webmail=22_=3Cprevencaoded?=@mail.serrinha.ba.gov.br, =?utf-8?q?emandas=40jfrn=2Ejus=2Ebr=3E?=@mail.serrinha.ba.gov.br Sender (@) is the same from above. In this case a decoding step is need. After decoding -> "Linha de comunicação de webmail" <prevencaoded@mail.serrinha.ba.gov.br,emandas@jfrn.jus.br>@mail.serrinha.ba.gov.br. It's malformed anyway. Whats is the policy in this case? If malformed then show empty string?

And thank you for making/adapting this add-on. I recieve a lot of spam (10's) daily and this add-on (as is) saves me 5 min every day in my task to review all SPAM marked e-mails.

lkosson commented 3 years ago

In first case problem is caused by a comma. Thunderbird provides recipients list as a comma-separated string. The add-on naively splits it using String.split ignoring the fact some commas might be part of a recipient name. I guess using regexp like (".*?"|[^",\s]+)(?=\s*,|\s*$) to split recipients would help, but there still would be a problem if name didn't contain quotes at all.

Second one is kind of expected. If address does not contain < and >, the whole header is returned as-is.

I'm not quite sure how to deal with third one. Thunderbird API does not provide decoding function consistent with its built-in columns, and this address don't follow MIME format anyway. Unwritten policy of the add-on is to extract raw e-mail if it matches Full Name <username@hostname> format and return whole unformatted string otherwise. Without proper decoding, there is no hope to properly match any address in this header, and even then it would be just guessing.