mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
389 stars 196 forks source link

URLs in Incoming Message Mangled On Display #141

Open RichardTaylor opened 13 years ago

RichardTaylor commented 13 years ago

Where URLs contain unencoded brackets :

"(" or ")"

only the bit of the URL up to the first bracket is made into a clickable link; meaning a broken link and mangled display.

Issue appears to be in the regexs on lines 160/161 at

mysociety/commonlib/rblib/format.rb

If using characters such as "(" or ")" in a url it is advisable to "encode" them, in the form "%28", "%29", this is to prevent problems such as those experienced here.

One could view the incoming URLs, or their presentation, as the problem rather than the WhatDoTheyKnow.com code. However much URL presentation software does deal with the presence of "(" and ")" and the use of those characters as as far as I can see is just unadvisable in URLs rather than prohibited.

Example page the issue occurs on:

http://www.whatdotheyknow.com/request/quality_innovation_productivity#incoming-202338

I'm not aware of other occurrences of this exact problem

RichardTaylor commented 13 years ago

Hmm the same URLs work when pasted into annotations:

http://www.whatdotheyknow.com/request/quality_innovation_productivity#incoming-202338

RichardTaylor commented 13 years ago

Much more common is where a URL in an incoming message is split between two lines eg.

http://www.whatdotheyknow.com/request/traffic_management_cameras_and_a_2#incoming-203920

Could this be detected, or should we consider it a problem with the incoming message

RichardTaylor commented 10 years ago

Another example of a mangled URL. This one looks fine in the raw message (after BASE64 decoding it) but has somehow got mangled (with extra characters added at the end) on display:

https://www.whatdotheyknow.com/request/tlcs_license_holders#incoming-558610

RichardTaylor commented 4 years ago

Case at

https://www.whatdotheyknow.com/request/priory_court_housing_development#comment-90679

where it looks as if unencoded spaces may have prevented the URL displaying properly. Looks like there was only a HTML version of the email. The link in email was:

<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI162882785/4fa7a041dbf64/Priory Court FOI162882785 Final.docx">FOI162882785</a></p>

RichardTaylor commented 4 years ago

An apparently related issue has occurred at:

https://www.whatdotheyknow.com/request/total_revenue_per_month_from_not#incoming-1655287

gbp commented 3 years ago

Case where a URL includes a trailing square bracket:

https://www.whatdotheyknow.com/request/correspondence_about_foi_rights#outgoing-1101697

RichardTaylor commented 3 years ago

Another example of the form reported at https://github.com/mysociety/alaveteli/issues/141#issuecomment-576282653 has occurred, where it looks as if unencoded spaces may have prevented the URL displaying properly.

<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI293299650/7b0c9d86c9094/FOI293299650 Request for documents about policies for older people in the Borough response.docx">FOI293299650 Request for documents about policies for older
 people in the Borough response</a></p>

Request: https://www.whatdotheyknow.com/request/request_for_documents_about_poli#incoming-1707682

RichardTaylor commented 3 years ago

A further example of a bug relating to the presentation of a link containing unencoded spaces:

https://www.whatdotheyknow.com/request/correspondence_with_natural_engl_3#incoming-1782897

RichardTaylor commented 3 years ago

A further example of a bug relating to the presentation of a link containing unencoded spaces:

https://www.whatdotheyknow.com/request/sound_insulation_sams_tmo_2#outgoing-1153798

RichardTaylor commented 3 years ago

At

https://www.whatdotheyknow.com/request/funerals_352#incoming-1793031

a non breaking space, encoded as %A0 has been appended to the URL prior to display. This doesn't appear to be in the raw email, in either the plain text, or HTML, versions.

RichardTaylor commented 3 years ago

A further new example of this class of bug appears at:

https://www.whatdotheyknow.com/request/temporary_accommodation_nightly_332#incoming-1756706

The link coded as:

<a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI311969445/f7e5641be7e54/FOI311969445- Temporary Accommodation - Nightly Rates.pdf">FOI311969445- Temporary Accommodation - Nightly Rates</a>

does not display properly, apparently due to the spaces.

mdeuk commented 3 years ago

does not display properly, apparently due to the spaces.

It might be worth us reaching out to the Council in question in case they don’t realise this problem actually exists - these malformed URLs will affect more than just our users, so fixing it would be a win for everyone.

RichardTaylor commented 3 years ago

There is a case of links not working, perhaps due to the presence of spaces, at:

https://www.whatdotheyknow.com/request/adult_social_services_structure_25

A link here was coded as:

<br> <a target="_BLANK" href="https://apps-waltham-forest.s3.amazonaws.com/foi/FOI343600248/b8a2f56661334/Housing Operations Structure chart.pdf">Housing Operations Structure Chart </a>, <br>

It does appear it is at best bad practice to include spaces in a URL https://stackoverflow.com/questions/497908/is-a-url-allowed-to-contain-a-space

RichardTaylor commented 2 years ago

The presence of a space in a link caused an issue at:

https://www.whatdotheyknow.com/request/freedom_of_information_request_p_227#incoming-1232455

HelenWDTK commented 10 months ago

We've had a user contact us who has had problems when trying to add a link to the outgoing message. They wanted to write:

(https://www.whatdotheyknow.com/request/request_title#incoming-12345678)?

The target of the link was rendered in the preview as pointing to:

https://www.whatdotheyknow.com/request/request_title#incoming-12345678)?

and did not display correctly/point to the correct place unless a space was added between ) and ?

confirmordeny commented 10 months ago

A user has flagged an instance of this today.

WilliamWDTK commented 6 months ago

This happened at https://www.whatdotheyknow.com/request/local_transport_plan_status_targ_74#incoming-2600725, where a link seemed to be split at a full stop in the first (in text) occurence, but worked normally at the list at the end of the message.