Open RichardTaylor opened 13 years ago
Hmm the same URLs work when pasted into annotations:
http://www.whatdotheyknow.com/request/quality_innovation_productivity#incoming-202338
Much more common is where a URL in an incoming message is split between two lines eg.
http://www.whatdotheyknow.com/request/traffic_management_cameras_and_a_2#incoming-203920
Could this be detected, or should we consider it a problem with the incoming message
Another example of a mangled URL. This one looks fine in the raw message (after BASE64 decoding it) but has somehow got mangled (with extra characters added at the end) on display:
https://www.whatdotheyknow.com/request/tlcs_license_holders#incoming-558610
Case at
https://www.whatdotheyknow.com/request/priory_court_housing_development#comment-90679
where it looks as if unencoded spaces may have prevented the URL displaying properly. Looks like there was only a HTML version of the email. The link in email was:
<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI162882785/4fa7a041dbf64/Priory Court FOI162882785 Final.docx">FOI162882785</a></p>
An apparently related issue has occurred at:
https://www.whatdotheyknow.com/request/total_revenue_per_month_from_not#incoming-1655287
Case where a URL includes a trailing square bracket:
https://www.whatdotheyknow.com/request/correspondence_about_foi_rights#outgoing-1101697
Another example of the form reported at https://github.com/mysociety/alaveteli/issues/141#issuecomment-576282653 has occurred, where it looks as if unencoded spaces may have prevented the URL displaying properly.
<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI293299650/7b0c9d86c9094/FOI293299650 Request for documents about policies for older people in the Borough response.docx">FOI293299650 Request for documents about policies for older
people in the Borough response</a></p>
Request: https://www.whatdotheyknow.com/request/request_for_documents_about_poli#incoming-1707682
A further example of a bug relating to the presentation of a link containing unencoded spaces:
https://www.whatdotheyknow.com/request/correspondence_with_natural_engl_3#incoming-1782897
A further example of a bug relating to the presentation of a link containing unencoded spaces:
https://www.whatdotheyknow.com/request/sound_insulation_sams_tmo_2#outgoing-1153798
At
https://www.whatdotheyknow.com/request/funerals_352#incoming-1793031
a non breaking space, encoded as %A0
has been appended to the URL prior to display. This doesn't appear to be in the raw email, in either the plain text, or HTML, versions.
A further new example of this class of bug appears at:
https://www.whatdotheyknow.com/request/temporary_accommodation_nightly_332#incoming-1756706
The link coded as:
<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI311969445/f7e5641be7e54/FOI311969445- Temporary Accommodation - Nightly Rates.pdf">FOI311969445- Temporary Accommodation - Nightly Rates</a>
does not display properly, apparently due to the spaces.
does not display properly, apparently due to the spaces.
It might be worth us reaching out to the Council in question in case they don’t realise this problem actually exists - these malformed URLs will affect more than just our users, so fixing it would be a win for everyone.
There is a case of links not working, perhaps due to the presence of spaces, at:
https://www.whatdotheyknow.com/request/adult_social_services_structure_25
A link here was coded as:
<br> <a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI343600248/b8a2f56661334/Housing Operations Structure chart.pdf">Housing Operations Structure Chart </a>, <br>
It does appear it is at best bad practice to include spaces in a URL https://stackoverflow.com/questions/497908/is-a-url-allowed-to-contain-a-space
The presence of a space in a link caused an issue at:
https://www.whatdotheyknow.com/request/freedom_of_information_request_p_227#incoming-1232455
We've had a user contact us who has had problems when trying to add a link to the outgoing message. They wanted to write:
(https://www.whatdotheyknow.com/request/request_title#incoming-12345678)?
The target of the link was rendered in the preview as pointing to:
https://www.whatdotheyknow.com/request/request_title#incoming-12345678)?
and did not display correctly/point to the correct place unless a space was added between )
and ?
A user has flagged an instance of this today.
This happened at https://www.whatdotheyknow.com/request/local_transport_plan_status_targ_74#incoming-2600725
, where a link seemed to be split at a full stop in the first (in text) occurence, but worked normally at the list at the end of the message.
Where URLs contain unencoded brackets :
"(" or ")"
only the bit of the URL up to the first bracket is made into a clickable link; meaning a broken link and mangled display.
Issue appears to be in the regexs on lines 160/161 at
mysociety/commonlib/rblib/format.rb
If using characters such as "(" or ")" in a url it is advisable to "encode" them, in the form "%28", "%29", this is to prevent problems such as those experienced here.
One could view the incoming URLs, or their presentation, as the problem rather than the WhatDoTheyKnow.com code. However much URL presentation software does deal with the presence of "(" and ")" and the use of those characters as as far as I can see is just unadvisable in URLs rather than prohibited.
Example page the issue occurs on:
http://www.whatdotheyknow.com/request/quality_innovation_productivity#incoming-202338
I'm not aware of other occurrences of this exact problem