hastexo / tutor-contrib-backup

This is an experimental Tutor plugin. You should not consider it ready for production use at this point.
4 stars 14 forks source link

fix: Use TarSafe for extracting backup tarball #57

Open fghaas opened 1 year ago

fghaas commented 1 year ago

The tarfile.extractall() command is vulnerable to path traversal, which may be exploited by adding a member with an ../ path to the tarball. In our case, this might open up the possibility of malicious data injection to someone that doesn't normally have access to the Open edX cluster, but does have write access to the S3 bucket. In that case, bad things could happen upon extraction of a thus-crafted archive, during an automated restore.

This shouldn't have particularly wide-ranging implications since the only filesystem affected by such an attack would be the restore job's container, which is by definition short-lived. And an attacker with access to the S3 bucket could already do far greater damage to the Open edX installation by simply modifying the MongoDB or MySQL data contained in the tarball.

Still, it does not hurt to use a safer (if slightly slower) approach that is provided by the tarsafe module.

References: https://github.com/python/cpython/issues/73974 https://mail.python.org/pipermail/python-dev/2007-August/074290.html https://nvd.nist.gov/vuln/detail/CVE-2007-4559

fghaas commented 1 year ago

@angonz I wonder if you could help testing this. I know your backup tarballs are much larger than ours typically are, and I'd like to know if swapping in TarSafe for tarfile makes your restore operations take unacceptably longer. Would you mind giving this a try, by rebuilding your Docker image from my topic branch and attempting a restore using one of your larger tarballs?

angonz commented 1 year ago

Hi Florian, sorry for the late reply. Sure I will test. Just give me some time because the site is now in production and I will have to set up a test env.

fghaas commented 1 year ago

That'd be excellent, thank you!