pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.16k stars 6.95k forks source link

Address phototour.py security vulnerability (MITM via HTTP) #8041

Open WilliamRoyNelson opened 1 year ago

WilliamRoyNelson commented 1 year ago

🚀 The feature

Remove phototour.py to eliminate dependency on datasets hosted using HTTP instead of HTTPS.

https://github.com/pytorch/vision/blob/70a8e05a98ea8e32b98e5a09d22ab81dd3062234/torchvision/datasets/phototour.py#L37-L60

This vulnerability has been disclosed since 2020: https://github.com/418sec/huntr/pull/702

Motivation, pitch

phototour.py uses HTTP (not HTTPS) to download datasets, and is vulnerable to MITM attacks.

It may seem like a minor issue, but as tools like PyTorch become widely implemented in industry, strict security and regulatory policies come alongside. It's hard to justify allowing an easily exploitable vulnerability within a highly regulated environment.

As far back as 2018, the Chrome browser began marking websites that do not use https as "Not Secure" https://blog.chromium.org/2018/02/a-secure-web-is-here-to-stay.html This includes the website referenced in phototour image

Alternatives

  1. Rehost the files elsewhere on HTTPS (or using some other method like Git LFS)
  2. Convince the administrators of matthewalunbrown.com and icvl.ee.ic.ac.uk to use HTTPS
  3. Delete phototour.py

Additional context

No response

NicolasHug commented 1 year ago

Hi @WilliamRoyNelson , thanks for the report.

We'll be happy to use https links once the original source (http://phototour.cs.washington.edu/patches/default.htm) supports them. In the meantime, perhaps we can at least document that http is being used in the docstring. Happy to consider a PR