Open jackyzha0 opened 1 year ago
Are the symlinks being created by the tarball extraction process?
Yeah I think they are created but not dereferenced (materialized)
Even if you dereference them on the server they will end up being a copy of the linked files.
I think it would be safe to ignore all symlinks!
Is there a way to remove symlinks when untarring? I don't know if IPFS.addAll has a thing for removing them. https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FILES.md#ipfsaddallsource-options
from bsdtar
manpage:
SECURITY
Certain security issues are common to many archiving programs, including tar. In particular, carefully-crafted archives can
request that tar extract files to locations outside of the target directory. This can potentially be used to cause unwitting
users to overwrite files they did not intend to overwrite. If the archive is being extracted by the superuser, any file on the
system can potentially be overwritten. There are three ways this can happen. Although tar has mechanisms to protect against
each one, savvy users should be aware of the implications:
• Archive entries can have absolute pathnames. By default, tar removes the leading / character from filenames before
restoring them to guard against this problem.
• Archive entries can have pathnames that include .. components. By default, tar will not extract files containing ..
components in their pathname.
• Archive entries can exploit symbolic links to restore files to other directories. An archive can restore a symbolic
link to another directory, then use that link to restore a file into that directory. To guard against this, tar checks
each extracted path for symlinks. If the final path element is a symlink, it will be removed and replaced with the ar‐
chive entry. If -U is specified, any intermediate symlink will also be unconditionally removed. If neither -U nor -P
is specified, tar will refuse to extract the entry.
To protect yourself, you should be wary of any archives that come from untrusted sources. You should examine the contents of
an archive with
tar -tf filename
before extraction. You should use the -k option to ensure that tar will not overwrite any existing files or the -U option to
remove any pre-existing files. You should generally not extract archives while running with super-user privileges. Note that
the -P option to tar disables the security checks above and allows you to extract an archive while preserving any absolute
pathnames, .. components, or symlinks to other directories.
IIRC @jackyzha0 fixed this?
I don't think I ever got around to it :( iirc we punted it down the line
Payloads for sites can contain relative symlinks that point to sensitive content on the host machine. If an example file is a symlink the process of syncing to protocols may follow the symlink and upload the file. The danger here is that the symlink may be outside the actual uploaded folder
Reproducing
Steps to reproduce:
pwned.txt
file and symlink it to../../../../../../api.distributed.press/README.md
Proof-of-concept (courtesy of fauno): https://yolandia-sutty-nl.ipns.ipfs.hypha.coop/pwned.txt
This is fine on Hyper gateway as I don't think it follows symlinks (https://yolandia-sutty-nl.hyper.hypha.coop/pwned.txt) but our IPFS gateway does (https://yolandia-sutty-nl.ipns.ipfs.hypha.coop/pwned.txt)
As an additional note, creating a recursive symlink may also cause Distributed Press to hang when uploading (this can caused observed 504s)
Solution
rsync
has an option to mangle symlinks, we should probably do something similar: https://www.man7.org/linux/man-pages/man1/rsync.1.html (search for--munge-links
)