Closed confirmordeny closed 8 years ago
It's a great idea, but unfortunately it's impossible for us to say that the message has been delivered to the authority itself. The best we could say is that 'this message has been delivered to the authority, or possibly an organisation that handles mail for them'. Lots of councils use people like Messagelabs for content scanning, and we don't know what happens to the email once it's passed to them.
John, should be quite easy to do, as the Exim logs are already loaded into the database, it is a case of somebody who understands mail parsing them and making a nice display.
Ian, we consider in a social/political (rather than technical!) context that delivery to the MX server for their domain is delivery.
Any failure to deliver after that point is the authority's internal responsibility. That they might outsource (part of) that work to somebody else is irrelevant.
In the UK, in practice, this works quite well.
We often give the delivery log file lines from Exim to their technical people, and they find out why it wasn't processed, and what went wrong. I think quite a few authorities whitelisted our domain as a result - certainly we often ask them to!
Francis
On Mon, Sep 24, 2012 at 12:28:34AM -0700, Ian Chard wrote:
It's a great idea, but unfortunately it's impossible for us to say that the message has been delivered to the authority itself. The best we could say is that 'this message has been delivered to the authority, or possibly an organisation that handles mail for them'. Lots of councils use people like Messagelabs for content scanning, and we don't know what happens to the email once it's passed to them.
Reply to this email directly or view it on GitHub: https://github.com/mysociety/alaveteli/issues/624#issuecomment-8809321
In the common case, the message is either delivered straight away by WDTK's outgoing mail server, or delivery fails and a bounce appears on the request thread. So in a sense the lack of a bounce is the green light.
Trickier cases come when we're occasionally relaying through another host for some reason - usually because of delivery problems from WDTK direct to that authority in the past.
In that case WDTK does deliver the email somewhere, but it hasn't necessarily got as far as the advertised MX for the domain, and we can't ever be absolutely certain it got to the authority's own mail server.
I guess a nice display would consist of "successfully delivered to [mail server] at [time]" associated with a particular message, and then someone who vaguely understood mail would be able to confirm whether it looks like it got to the authority or not, without needing to ask us. The key bit of "technical" knowledge here is being able to look at the recipient mail server and rule out the very occasional case where the message might have got lost in a relay we chose to use.
The other tricky case is where an authority email address is out of date to the point of not being received by the authority at all [if it's the "wrong" address but still within the authority that's fine, at least under UK law]. This case is most common for small authorities like schools and parish councils.
A simple green tick is quite attractive for its simplicity. An embellishment might be to hide it once there's a response from the authority. Another embellishment might be to have a click through to more detailed logs - perhaps the raw exim logs with the request address redacted.
I was discussing this recently and some ideas included;
ICO decision notice FS50559082 concluded, after considering message delivery evidence from WhatDoTheyKnow
The Commissioner considers that there is sufficient evidence to show that the Cabinet Office had received the request on the date it was sent by the complainant. As it did not issue its refusal notice within 20 working days, it breached section 17(5) of the Act.
https://www.whatdotheyknow.com/request/appointment_of_qualified_persons#comment-64694
How big a deal do we want to make of this? A very subtle solution might be:
(where the question mark is a link to an FAQ item about how we track delivery, and why we've said "Sent" instead of "Delivered" in this case)
And then when it’s accepted by the mail server:
(where the tick reveals a popup or something showing what we mean by "delivered" perhaps even with some geeky mail server log for people who want to use it as some form of evidence)
Just my first thought. Any feedback?
I'm in favour of having the mail server log available, but hidden unless you actively seek it out - I think it would be good to have an indication in the 'delivered' mode that there is more to see - like the question mark in 'sent' mode. This could link to an explanation of the complexities of mail delivery, and an appropriately redacted version of the delivery log. Maybe we should show the requester an uncensored version that they could forward to the authority.
@crowbot: What are the situations in which somebody would seek out this proof of an email being delivered?
The authority claiming that they never received the request.
Will that ever be before the 20 working days has elapsed? And will it ever be after the authority has sent their first reply?
(I'm wondering whether it can be shown more intelligently – eg: inside the "status" line we already put above the opening request)
I think the answer to the first question could be yes - sometimes the team gets emails from authorities saying - "I've just browsed the site, and there are a load of requests we've never received" - and these might include relatively new requests. Less likely but still possible that the authority could contest delivery in or after their first response e.g. they could reply using the 'respond to this request' option and say "we never received it by email".
Good to know.
Shall I do a quick prototype of this in a branch, so we can see how it feels?
And if so, do you have any specific wording I should use?
Yes, that sounds great - I don't have any specific wording in mind at the moment.
Maybe we should show the requester an uncensored version that they could forward to the authority.
Even requestors should have the request email address redacted. A naughty requestor could fake a response if they had the address, damaging the credibility of the service.
What are the situations in which somebody would seek out this proof of an email being delivered?
I think the ~"delivery confirmed" message should be aimed at all those browsing the site; it's something which might encourage people to choose to use the service rather than make a request in private.
It'd also be good to make the raw logs available (with the checksum part of the email address redacted) so that requesters can send them to authorities or the ICO directly.
@crowbot @garethrees I've pushed a prototype of the markup to https://github.com/mysociety/alaveteli/compare/624-delivery-status and the styling to https://github.com/mysociety/alavetelitheme/compare/624-delivery-status. It looks like:
Would love feedback on whether you think it's obvious enough, and whether the wording and the display of the logfile is clear enough.
Is this a big enough feature that it's worth doing real user testing on?
Looks really great!
I think "Mailserver log" could say something generally understandable. e.g. "If the public body disputes delivery, send them this 'mail log' as proof of delivery:". (But better worded!)
Love the UI.
smtp_message_id
in the params_yaml
of an InfoRequestEvent
when a message is sent https://github.com/mysociety/alaveteli/blob/develop/app/models/outgoing_message.rb#L182-L193To prove message delivery, do we just need the C
field of the delivery log line?
C="250 ok 1446233056 qp 26062 server-4.tower-221.messagelabs.com!1446233056!7679409!1"
2015-10-30 19:24:16 [17817] 111111-222222-SM => person@example.com
F=<request-300000-123456@example.com>
P=<request-300000-123456@example.com>
R=dnslookup
T=remote_smtp
S=2297
H=cluster2.gsi.messagelabs.com [62.208.159.157]:25
X=TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128
CV=no
DN="C=US,ST=California,L=Mountain View,O=Symantec Corporation,OU=Symantec.cloud,CN=mail221.messagelabs.com"
C="250 ok 1446233056 qp 26062 server-4.tower-221.messagelabs.com!1446233056!7679409!1"
QT=1s
DT=0s
I think the 'C' line is right as it reflects the confirmation we got from the remote server, so it's what they'd want to debug with: http://www.exim.org/exim-html-current/doc/html/spec_html/ch-log_files.html
It might be useful to present the => person@example.com
and H
lines too as between them they confirm where we sent it as well as that it was accepted, but the wording to explain this would need some thought.
@zarino, the prototypes look really good! I think there are a few cases it would be good to design and think about text for:
Bah, the smtp_message_id
we store is not the MTA ID, its the Message-ID
In exim, the smtp_message_id
is only present in the message arrival (<=
) line, not the normal message delivery (=>
) line that we need to confirm a delivery.
We can at least parse the exim ID from the <=
line, and then use that to look for the =>
lines with the same exim ID.
Here's a sketch of how we can match all this up, in exim.
I don't know what edge cases there are at the moment, like resending.
OutgoingMessage.class_eval do
def smtp_message_id
info_request_events.
where(:event_type => 'sent').
map { |event| event.params[:smtp_message_id]}.
first # Not sure whether there can be more than one yet
end
def delivered?
delivery_confirmation ? true : false
end
def delivery_confirmation
m = delivery_message.match(/C="(.*)"/i)
m ? m[0] : nil
end
def delivery_message
mail_server_logs.
where("line ILIKE :mta AND line ILIKE :msgtype",
mta: "%#{ mta_id }%", msgtype: "%=>%").
first.
line
end
def mta_id
line = mail_server_logs.
where("line ILIKE :q", q: "%#{ smtp_message_id }%").
first.
line
`echo '#{line}'| cut -d' ' -f4`.strip
end
def mail_server_logs
info_request.
mail_server_logs
end
end
om = InfoRequest.find(300000).outgoing_messages.first
om.delivered?
# => true
Looks like about a third of requests don't have the standard* 3 mail server log lines:
user=> SELECT count(*) FROM (
user(> SELECT info_request_id, count(*)
user(> FROM mail_server_logs
user(> GROUP BY info_request_id) AS blah
user-> WHERE count <> 3
user-> ORDER BY count DESC;
count
--------
153614
(1 row)
user=> SELECT count(*) FROM (
user(> SELECT info_request_id, count(*)
user(> FROM mail_server_logs
user(> GROUP BY info_request_id) AS blah
user-> ORDER BY count DESC;
count
--------
309137
(1 row)
user=> SELECT count(*) FROM info_requests;
count
--------
312610
(1 row)
* The default success case seems to have a triplet of [sendmail call, <=
,=>
]
Some info requests will have more than 3 because they have more outgoing messages. We only record which info request a MailServerLog
belongs to, rather than the OutgoingMessage
Looks like we only started storing the smtp_message_id
later in the life of the app:
InfoRequest.find(7790).outgoing_messages.first.info_request_events.first.params
# => {:email=>"FOI requests at Harrow Borough Council <REDACTED>", :outgoing_message_id=>12977}
Edge case 1:
Some requests with 4 log lines have a ->
line "additional address in same delivery". This also includes a C
field with a "250 ok" confirmation.
puts InfoRequest.find(248214).outgoing_messages.first.mail_server_logs.map(&:line)
Some requests with 5 log lines get a failed delivery **
, and then a resend is successful. This is fine because there's still only one =>
line where the message got successfully delivered.
Some requests with 5 log lines get a successful delivery, and then two rejected RCPT
lines:
2015-04-28 18:10:17 [8767] H=(gogecko.com.au) [201.230.121.77]:50416 I=[46.43.39.82]:25 F=<linda@gogecko.com.au> rejected RCPT <request-230955-REDACTED@whatdotheyknow.com>: 201.230.121.77 is listed at zen.spamhaus.org
2015-05-26 17:15:14 [7996] H=(jamiel.net) [200.110.29.24]:57012 I=[46.43.39.82]:25 F=<a8839efb@jamiel.net> rejected RCPT <request-230955-REDACTED@whatdotheyknow.com>: 200.110.29.24 is listed at zen.spamhaus.org
These lines are 6 months after the normal triplet, so I think they're incoming messages to the request thread that have been rejected as spam.
Need to find all email sent type events, otherwise followups don't get found
def smtp_message_id
info_request_events.
- where(:event_type => 'sent').
+ where(:event_type => ['sent', 'followup_sent', 'resent', 'followup_resent']).
map { |event| event.params[:smtp_message_id]}.
first # Not sure whether there can be more than one yet
end
This makes nailing down the smtp_message_id
more difficult as there could be multiple events, which all have their own smtp_message_id
. We could just pick the last one, but then explaining what happened in the log might get complicated.
As of 9a4a929, we now have OutgoingMessage#mail_server_logs
, which returns each log line that we can find for a given outgoing message.
Atlatszo has one specific need… direct access to the email logs because there are lawsuits where the records prove that Atlatszo has indeed sent the request to the authority.
Here's a sketch of the script to run against WDTK once we deploy #2941.
def random_logs(request_id = nil)
request = if request_id
InfoRequest.find(request_id)
else
InfoRequest.order("RANDOM()").first
end
log_lines = request.mail_server_logs.map(&:line)
if log_lines.any?
puts "InfoRequest: #{ request.id }"
puts "============================"
puts log_lines
puts "\n"
request.outgoing_messages.each do |message|
puts "OutgoingMessage: #{ message.id }"
puts "============================"
lines = message.mail_server_logs.map(&:line)
if lines.any?
puts lines
else
puts "Probably didnt log the smtp_message_id in the InfoRequestEvent:"
puts message.info_request_events.map(&:params)
end
puts "\n"
end
nil
else
random_logs
end
end
random_logs
Should be able to save this to a file and run it with rails' runner
:
$ bundle exec rails runner tmp/sample_mail_server_logs.rb
InfoRequest: 91365
Sometimes the smtp_message_ids get stored with enclosing angle brackets, which prevents a match in the info request's mail server logs.
OutgoingMessage.find(189872).info_request_events.first.params[:smtp_message_id]
# => "<ogm-189872+4f6267b87bfa8-20d4@whatdotheyknow.com>"
OutgoingMessage.find(189872).smtp_message_ids.first
# => "<ogm-189872+4f6267b87bfa8-20d4@whatdotheyknow.com>"
OutgoingMessage: 372919
Only ever had a status of 250 Queued!
rather than 250 OK
– did it get delivered? (log output is as expected, but this might be a case where we can't verify delivery)
Sometimes the smtp_message_ids get stored with enclosing angle brackets
As far as I can tell, we haven't changed anything and it looks like everything recent has no angle brackets, so I guess this is a change in a dependency (maybe the move to the mail gem?).
Either way, should just be able to handle both cases.
InfoRequest: 112495
lots of outgoing messagesInfoRequest: 122592
lots of mail log but only 2 outgoing messages; confirmed correct; lots of incoming messages in this oneNext steps after #3136 is verified:
This sprint (2016-04-11)
MailServerLogsController#show
which takes an OutgoingMessage
and shows its mail server logsNext sprint
OutgoingMessage#delivered?
InfoRequest: 262781
example of a failed delivery (**
line) and a bounce message getting sent to the request author (<>
line). The log lines for the outgoing message include the message reception of the outgoing bounce email, which I'm not sure it shouldRe '250 Queued' - I think this is equivalent to '250 OK'. That's what (might) happen in practice any time you deliver an email to someone else's server. It's still been delivered to that server (which is the one controlled or designated by the authority).
Example of a resent (and still failed) (the resend was successfully delivered) message: OutgoingMessage.find(531356).mail_server_logs
The logs:
2016-04-06 12:01:07 [14928] 1anlCt-0003sm-LG <= request-326806-hk27iwn7@localhost U=alaveteli P=local S=1923 id=ogm-326806+5704ec7388370-456e@whatdotheyknow.com T="Freedom of Information request - Computers" from <request-326806-hk27iwn7@localhost> for foi@authority.net foi@authority.net
2016-04-06 12:01:08 [14933] 1anlCt-0003sm-LG ** foi@authority.net F=<request-326806-hk27iwn7@localhost>: all relevant MX records point to non-existent hosts
2016-04-06 12:01:08 [14933] 1anlCt-0003sm-LG ** foi@authority.net F=<request-326806-hk27iwn7@localhost>: all relevant MX records point to non-existent hosts
2016-04-06 12:01:08 [14935] 1anlCu-0003st-1p <= <> R=1anlCt-0003sm-LG U=Debian-exim P=local S=2934 T="Mail delivery failed: returning message to sender" from <> for request-326806-hk27iwn7@localhost
No delivery line (=>
) with a C=
to parse the status, so #3214 wouldn't currently parse the above logs as "failed".
Note to check https://www.whatdotheyknow.com/request/facial_recognition_capabilities – might be an interesting case.
Status Update:
The ability to view Mail Server Logs has been deployed (#3209, for admins only) with the intention of removing the admin-only filter early this week.
The next step is working on the delivery status parser (#3214) to establish whether mail has been delivered or not. TL;DR is that parsing a "normal" delivery is fine, but parsing failures is harder (because exim uses a different line style instead of just using a different status code).
The ability to view Mail Server Logs has been deployed (#3209, for admins only) with the intention of removing the admin-only filter early this week.
Just rolled this out to both admins and request owners (#3248). Not a massive amount of value in showing somewhat sensitive (and redacted) logs to non-request owners, but will show them the delivery status when that's available.
This is a major step forward and an important breakthrough. I have a signficant concern, though.
If requesters have access to the full request email address of their requests, including the checksum, there is no doubt that some of them will abuse it by sending mail appearing to come from the public authority. We inevitably have a number of vexatious users who would make use of this to cause mischief.
Also, some users will doubtless manage to get confused and use the request addresses to send allsorts e.g. immigration concerns to that request address. It's amazing how confused some people with limited technical ability or limited English language can get - for example, we regularly get requests to change our record of public authorities' email addesses to somebody's personal address as an attempt to raise contact the authority over immigration and other matters. Though this is of less concern than the vexatious one.
In my opinion, the unredacted request address must only be available to the authority responding to each request, and to admins and developers. Otherwise there will be all sorts of problems and the integrity of the communications on our website ostensibly sent by public authorities will be in doubt.
The "release mail log" feature is incredibly important and useful; we frequently confirm that authorities' email servers have received request emails, and it's a great step forward, but if we're providing the log file to anybody other than the authority, or publishing it as an annotation on the request page, we always redact the checksum - request-XXXXXX-(redacted)@whatdotheyknow.com - for this reason.
Should write some documentation about this for alaveteli.org admin guides. Will probably wait until we've got the delivery status parsing done. Could also do with better user-facing documentation about what they are, though this might become clearer with the delivery status parsing.
Sometimes public bodies claim not to have received FOI requests. When requesters write to us a skilled member of them (i.e. not me because I don't know how!) writes back and says something like:
I've checked the mail delivery logs for http://www.whatdotheyknow.com/request/[...] and I can confirm that both your messages were successfully delivered to [public authority]'s mail servers.
I just wonder whether it might be possible to show on the request page a little green light or something that shows each outward message was successful delivered. Perhaps it could also show the date and time of delivery. I have no idea how difficult it is.
It is far from high priority. I feel it would add something to the trustworthiness of the record because people could think to themselves "yes they really did received this message"