mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
389 stars 195 forks source link

Handle broken email subject encoding #5396

Open gbp opened 5 years ago

gbp commented 5 years ago

Messages have been received where the subject UTF-8 encoding is incorrect and headers are being included in the middle of the subject.

A made up example of an unprocessable message received:

Subject: =?UTF-8?Q?h=C3=A9ll=C3=B8
From: foo@bar

w=C3=B8rld?=
  =?UTF-8?Q?_fo=C3=B8?=
To: baz@quux

Hello, this is the text of the email.

With should actually this should be:

Subject: =?UTF-8?Q?h=C3=A9ll=C3=B8?=
  =?UTF-8?Q?_w=C3=B8rld?=
  =?UTF-8?Q?_fo=C3=B8?=
From: foo@bar
To: baz@quux

Hello, this is the text of the email.

or

Subject: =?UTF-8?Q?h=C3=A9ll=C3=B8_w=C3=B8rld?=
  =?UTF-8?Q?_fo=C3=B8?=
From: foo@bar
To: baz@quux

Hello, this is the text of the email.

We could detect and correct these messages but really the authority should be notified so valid responses can be sent.

garethrees commented 5 years ago

Linking to original discussion https://groups.google.com/a/mysociety.org/forum/#!topic/alaveteli/xaO9dI51kuc

gbp commented 4 years ago

This is also effecting requests on WDTK [1], [2] although ISO 8859-1 not UTF8 encoding.

gbp commented 4 years ago

To fix the WDTK incoming messages above I have corrected the subject of the raw emails on disk then in the console ran:

m = IncomingMessage.find ID
m.parse_raw_email!(true)
m.clear_in_database_caches!
MattK1234 commented 3 years ago

2 messages like this on request 708365 and the user has contacted us at WDTK support.

We have opened the email in a normal email application and it is still illegible.

We've suggested the user ask the Council to send the response again, although I am not sure if this will work.