thewca / worldcubeassociation.org

All of the code that runs on worldcubeassociation.org
https://www.worldcubeassociation.org/
GNU General Public License v3.0
320 stars 175 forks source link

Disallow images from non-WCA sources to be embedded on competition pages #8943

Open nsilvestri opened 5 months ago

nsilvestri commented 5 months ago

Is your feature request related to a problem? Please describe. Competition tabs and information boxes allow photos to be embedded inline. There is no restriction on the domains that these images are hosted from, so data controlled by a third-party website is being displayed directly on the WCA website. This has occasionally resulted in images hosted ephemeral data sources like Discord or Messenger to go missing when the URL to the content changes or expires. It can also lead to data rot on older competition pages with defunct domains or old URLs. In the worst case, a nefarious actor could hijack an expired domain and display any imagery they like directly on the WCA website.

Here's an example of a competition with a non-WCA image embedded into the infobox: https://www.worldcubeassociation.org/competitions/PyraminxontheStars2024

Describe the solution you'd like Any images hosted outside the WCA website's image bucket should not be able to be embedded in competition pages.

Describe alternatives you've considered WCAT could manually check that no third-party websites are used for embedded images on competition submission, but that still leaves the door open for making mistakes. Automating the process will reduce the chances of an error and save WCAT's time from checking links.

dunkOnIT commented 5 months ago

Is this a mandate from a particular team, or just a suggestion at this stage?

EDIT: Common "Duncan horribly wrong" situation. Seems like manual enforcement by WCAT may be the best way forward for now?

~In either case, it seems it would rely on the WCA supporting image-upload capabilities, which we don't at the moment (avatars code is far from generalizable as I understand).~

~From a security perspective, an alternative could be to "whitelist" websites where images can be hosted - ie, anywhere that isn't open to domain/URL hijacking (imgur seems fine in this regard, for example). I'd leave this up to WCAt enforcement for now.~

~From a data rot perspective, the long-term solution probably is to have native image upload, and when we eventually get to an avatars rewrite we can try and generalize it to image uploads overall, not just avatar uploads.~

gregorbg commented 5 months ago

In either case, it seems it would rely on the WCA supporting image-upload capabilities, which we don't at the moment (avatars code is far from generalizable as I understand).

We already do support generic image upload in the Markdown editors of the Create Competition form, for example in the box for Extra Registration Requirements. Most people just tend to paste their links, though.

Sanitizing Markdown is horror, so from the top of my head I don't have any really good idea :(

dunkOnIT commented 5 months ago

Perhaps I'm about to describe the horror you were talking about, but could we have some kind of regex that finds:

nsilvestri commented 5 months ago

Not a request from a team, just my own observations. With that said, I'm sure WCAT would prefer to not manually check this if at all possible.

Image embeds have a special formatting that might make this easier: ![alt text](url). The ! is for embedded images, so there would be no need to check for file extensions.

!\[.*\]\(https://s3\.us-west-2\.amazonaws\.com/www\.worldcubeassociation\.org/[^)]+\)

This still allows traditional links to non-WCA domains with []() but the fact that a user will be redirected to another domain on click is better than presenting non-WCA content inline on the WCA website, in my opinion.

slongbehn commented 5 months ago

WCAT's workflow list continues to grow and I do not believe it is a reasonable long-term solution for us to manage this. Short term, absolutely, but our QOL requests are repeatedly disregarded and there will always be cases of this slipping through the cracks. When delegate ignore this mandate, are we do reupload it ourselves? They are unable to change it without us returning the competition to them. Asking them to change it may delay announcement as well. I cannot speak to the technical side of things, but the number of touchpoints we have is already exhausting and prone to mistakes.

dunkOnIT commented 5 months ago

Well put Shain. I'm on board with a software fix, or no fix at all (ie, don't add to WCAT's manual workload)