vimeo / psalm

A static analysis tool for finding errors in PHP applications
https://psalm.dev
MIT License
5.57k stars 660 forks source link

Should files be considered taint sources? #9579

Open mmcev106 opened 1 year ago

mmcev106 commented 1 year ago

For example, should the following report a taint?

https://psalm.dev/r/6f762f8e9e

If this is an opinionated issue, should this be configurable? I'm happy to work on the PR either way.

psalm-github-bot[bot] commented 1 year ago

I found these snippets:

https://psalm.dev/r/6f762f8e9e ```php
weirdan commented 1 year ago

What would be the taint source here?

mmcev106 commented 1 year ago

I'm not sure I understand the question... I was simply thinking user uploaded files pose similar risks as $_GET.

weirdan commented 1 year ago

Taint analysis is based on tracing the data flows between taint sources (function return values, superglobals, etc.) and taint sinks (function parameters, generally, but also echo and similar construct). The sink is obvious from your example (it's echo), but the source is not.

mmcev106 commented 1 year ago

In our use case there are many ways that many different types of files may end up containing unescaped user input. I suspect this means file_get_contents() itself should be considered a taint source. It sounds like Vimeo might be comfortable considering files a safe input source in general. My team is not. If this is not a configurable feature that Vimeo is interested in, I can implement it via stubs or a plugin.

weirdan commented 1 year ago

Vimeo has had almost no involvement in Psalm since after Matt (the original author) left the company. To my knowledge, none of the current maintainers is a Vimeo employee (I'm certainly not), so whatever our position may be on any given issue, it should not be interpreted as Vimeo's.

With that, hopefully, cleared up, I'm not against discussing this particular feature, its pros and contras and arriving at a viable implementation. Discussions like this tend to involve some explanations (never intended as patronising), requests for clarifications and a grain of scepticism, all intended to improve this project.

mmcev106 commented 1 year ago

Interesting history on the project... Thanks for sharing.

I help maintain a third party "module" repository for REDCap. It might be easiest to think of my role as similar to whomever is the gatekeeper for approving WordPress plugin submissions. I have seen cases where module authors do things like the following:

file_put_contents('some/dir/config.json', json_encode([
    'someOption' => $_GET['value']
]));

Then reference the user supplied value later as follows:

$config = json_decode(file_get_contents('some/dir/config.json'), true);
echo $config['someOption'];

Another common example includes file uploads that are saved for later processing. I don't think it is reasonable to assume that all third parties will correctly mark taint sources in their submitted code. Marking all file sources as tainted by default is likely the most appropriate solution, since it leads to review of each individual case. I suspect similar risks exist in any low-code development platform that allows third party plugins.

weirdan commented 1 year ago

Do you think it's reasonable to expect uploaded files to be stored in some pre-defined directories? This could help limit the false positives marking all of the filesystem functions as taint sources would cause.

Also, given your example, would it not make more sense to consider the second argument of file_put_contents() as a taint sink, nipping the issue in the bud?

mmcev106 commented 1 year ago

While we may technically be able to find a way to police user sourced file locations, I'm not sure this is logistically or politically feasible across all installations and third party modules and servers worldwide. I suspect marking file_get_contents() and similar methods as a taint source is much more feasible in this case.

I'm not sure pre-escaping any content written to files would work, as the type of escaping necessary is not known until usage time. We wouldn't have a way of knowing whether to HTML escape the input, check for path traversal attacks, SQL injection, etc.

weirdan commented 1 year ago

While we may technically be able to find a way to police user sourced file locations, I'm not sure this is logistically or politically feasible across all installations and third party modules and servers worldwide.

I was thinking about a config knob in psalm.xml so users can configure it to match their folder structure. E.g.

<psalm>
  <userGeneratedFiles>
     <directory name="app/storage/uploads" />
     <directory name="storage/avatars" />
     <!-- and so on -->
  </userGeneratedFiles>
</psalm>
mmcev106 commented 1 year ago

I like the idea. I'm struggling to see how it would work though, as the paths can be system dependent (sometimes sourced from database values, environment variables, etc.).

mmcev106 commented 1 year ago

Well, I guess I'll do my own thing in a stub or plugin. @weirdan, if you'd like me to try and build anything into psalm directly, let me know.