vimeo / psalm

A static analysis tool for finding errors in PHP applications
https://psalm.dev
MIT License
5.54k stars 659 forks source link

Add taint flows for remaining built-in pure functions such as utf8_decode, max($strings), etc #3636

Open TysonAndre opened 4 years ago

TysonAndre commented 4 years ago

It seems like psalm only knows about functions in src/Psalm/Internal/Stubs/CoreGenericFunctions.phpstub . It may help to expand that list of functions to other pure functions such as json_encode(), base64_decode(), trim(), etc.

<?php
$globalVariable = $_GET['evil'];
eval('echo ' . json_encode($globalVariable));

This example still allows evaluating arbitrary code, such as echo "$(ls)"

https://github.com/phan/phan/blob/master/src/Phan/Plugin/Internal/UseReturnValuePlugin.php may be of help, because it lists many common "pure" functions that return a value based on their inputs.

Off-topic notes:

Aside: $_REQUEST combines $_GET and $_POST, so should that also be included as a source? $_COOKIE can be set by browsers, so should that be considered for eval (but possibly not html)

Aside: json_encode() technically escapes html, because the only reasonable place to echo it is inside <script>. However, it might be a useful sanity check to assert that the file contains </script> after the echo line but before the next <script> substring, if any occur. Probably not worth the effort.

muglug commented 4 years ago

$_REQUEST combines $_GET and $_POST

Good point, I've fixed that

TysonAndre commented 4 years ago

Miscellaneous notes I might find useful if working on this in the future:

In src/Psalm/Internal/Analyzer/Statements/Expression/Call/FunctionCallAnalyzer.php, $function_storage->return_source_params and attributes such as $function_storage->added_taints and $function_storage->removed_taints seem to be how these propogate. (this may change)

muglug commented 4 years ago

I improved things a little in e8be2c5, adding support for (l|r)?trim and explode, which lead to the discovery of 12 new XSS bugs in Vimeo's code.

muglug commented 4 years ago

json_encode is this interesting case where the victim of tainted input is normally going to be a Javascript app that Psalm doesn't know about (at least that's the case at Vimeo).

It might be useful to generate a map of all tainted json_encoded data that could be passed to a JS taint analysis tool.

TysonAndre commented 4 years ago

Some code to generate candidates is below - this excludes functions that are possibly impure depending on their args. Some obvious ones are commented out.

chop() is an alias of rtrim().

I don't know how taint detection currently works with array keys/values or how it is meant to work

<?php

use Phan\Plugin\Internal\UseReturnValuePlugin;
use Phan\Language\FQSEN\FullyQualifiedFunctionName;
use Phan\Language\UnionType;
use Phan\Language\Element\FunctionInterface;

require_once dirname(__DIR__) . '/src/Phan/Bootstrap.php';
$code_base = require(dirname(__DIR__) . '/src/codebase.php');
$unsafe_types = UnionType::fromFullyQualifiedRealString('string|array');

$isPotentialTaintPropogator = function (FunctionInterface $function) use ($code_base, $unsafe_types): bool {
    $function_return_type = $function->getUnionType();
    if (!$function_return_type->canCastToUnionType($unsafe_types)) {
        return false;
    }
    foreach ($function->getParameterList() as $param) {
        if ($param->getUnionType()->canCastToUnionType($unsafe_types)) {
            return true;
        }
    }
    return false;
};

foreach (UseReturnValuePlugin::HARDCODED_FQSENS as $fqsen_string => $value) {
    if (strpos($fqsen_string, '::') !== false) {
        continue;
    }
    if ($value !== true) {
        continue;
    }
    $fqsen = FullyQualifiedFunctionName::fromFullyQualifiedString($fqsen_string);
    if (!$code_base->hasFunctionWithFQSEN($fqsen)) {
        continue;
    }
    $function = $code_base->getFunctionByFQSEN($fqsen);
    // echo "looking up $fqsen\n";
    if (!$isPotentialTaintPropogator($function)) {
        continue;
    }
    echo "$function\n";
}
<?php

// Limitations:
// - Excludes uncommon functions like hebrev()
// - Excludes potentially impure functions such as var_export(), highlight_string()

// Returns original string if no translation is found
function _(string $message) : string;
// prefer htmlentities/escapeshellarg()
function addcslashes(string $str, string $charlist) : string;
function addslashes(string $str) : string;
// Taint checking probably won't be able to check if keys are tainted.
function array_change_key_case(array $input, int $case = unknown) : associative-array<mixed,mixed>;
function array_chunk(array $input, int $size, bool $preserve_keys = unknown) : list<array>;
function array_column(array $array, mixed $column_key, mixed $index_key = unknown) : array;
function array_combine(int[]|string[] $keys, array $values) : associative-array<mixed,mixed>|false;
function array_count_values(array $input) : associative-array<mixed,int>;
function array_diff_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_diff_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_diff(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_fill_keys(array $keys, mixed $val) : array;
function array_fill(int $start_key, int $num, mixed $val) : array<int,mixed>;
function array_filter(array $input, callable(mixed):bool|callable(mixed,mixed):bool $callback = unknown, int $flag = unknown) : associative-array<mixed,mixed>;
function array_flip(array $input) : associative-array<mixed,int>|associative-array<mixed,string>;
function array_intersect_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_intersect_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_intersect(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_key_first(array $array) : int|null|string;
function array_key_last(array $array) : int|null|string;
function array_keys(array $input, mixed $search_value = unknown, bool $strict = unknown) : list<int>|list<string>;
function array_map(?callable $callback, array $input1, array ...$args) : array;
function array_merge_recursive(array $arr1, array ...$args) : array;
function array_merge(array $arr1, array ...$args) : array;
function array_pad(array $input, int $pad_size, mixed $pad_value) : array;
function array_rand(array $input, int $num_req) : array<int,int>|array<int,string>|int|string;
function array_reduce(array $input, callable(mixed,mixed):mixed $callback, mixed $initial = unknown) : mixed;
function array_replace_recursive(array $arr1, array $arr2, array ...$args) : array;
function array_replace(array $arr1, array $arr2, array ...$args) : array;
function array_reverse(array $input, bool $preserve = unknown) : array;
function array_search(mixed $needle, array $haystack, bool $strict = unknown) : false|int|string;
function array_slice(array $input, int $offset, ?int $length = null, bool $preserve_keys = unknown) : array;
function array_unique(array $input, int $sort_flags = unknown) : associative-array<mixed,mixed>;
function array_values(array $input) : list<mixed>;
function base64_decode(string $str, bool $strict = unknown) : false|string;
// function base64_encode(string $str) : string;
// function base_convert(string $number, int $frombase, int $tobase) : string;
function basename(string $path, string $suffix = unknown) : string;
// function bin2hex(string $data) : string;
function bzcompress(string $source, int $blocksize100k = unknown, int $workfactor = unknown) : int|string;
function bzdecompress(string $source, int $small = unknown) : int|string;
function chop(string $str, string $character_mask = unknown) : string;
function chunk_split(string $str, int $chunklen = unknown, string $ending = unknown) : string;
// function class_implements(object|string $what, bool $autoload = unknown) : array<string,class-string>|false;
// function class_parents(object|string $instance, bool $autoload = unknown) : array<string,class-string>|false;
function compact(array|string $var_name, array|string ...$var_names) : array;
function convert_cyr_string(string $str, string $from, string $to) : string;
function convert_uudecode(string $data) : string;
function convert_uuencode(string $data) : string;
// function count_chars(string $input, int $mode = unknown) : array<int,int>|false|string;
function current(array|object $array_arg) : false|mixed;
function date(string $format, int $timestamp = unknown) : string;
function dirname(string $path, int $levels = unknown) : string;
function each(array &$arr) : array;
// eval safe?
function escapeshellarg(string $arg) : string;
function explode(string $separator, string $str, int $limit = unknown) : list<string>;
// function fgetcsv(resource $fp, int $length = unknown, string $delimiter = unknown, string $enclosure = unknown, string $escape = unknown) : false|list<?string>;
// function file(string $filename, int $flags = unknown, resource $context = unknown) : false|list<string>;

// filter_input filter types depends on $type/$filter
// function filter_input_array(int $type, array|int $definition = unknown, bool $add_empty = unknown) : false|mixed;
// function filter_input(int $type, string $variable_name, int $filter = unknown, array|int $options = unknown) : false|mixed;
// function filter_var(mixed $variable, int $filter = unknown, mixed $options = unknown) : false|mixed;
// function filter_var_array
// function get_cfg_var(string $option_name) : array[]|false|string|string[];
// function get_class_methods(mixed $class) : list<string>;
// function getenv(string $varname, bool $local_only = unknown) : false|string;
// function getimagesize(string $imagefile, array &$info = unknown) : false|int[]|string[];
// function get_parent_class(mixed $object = unknown) : class-string|false;
function gettext(string $msgid) : string;
// function gettype(mixed $var) : string;
// Can unescape $format with backslashes if user controlled
function gmdate(string $format, int $timestamp = unknown) : false|string;
function gzcompress(string $data, int $level = unknown, int $encoding = unknown) : false|string;
function gzdecode(string $data, int $length = unknown) : false|string;
function gzdeflate(string $data, int $level = unknown, int $encoding = unknown) : false|string;
function gzencode(string $data, int $level = unknown, int $encoding_mode = unknown) : false|string;
function gzinflate(string $data, int $length = unknown) : false|string;
function gzuncompress(string $data, int $length = unknown) : false|string;
// unsafe with $raw_output = true
// function hash_hmac(string $algo, string $data, string $key, bool $raw_output = unknown) : string;
// function hash_pbkdf2(string $algo, string $password, string $salt, int $iterations, int $length = unknown, bool $raw_output = unknown) : string;
// function hash(string $algo, string $data, bool $raw_output = unknown) : string;
function hex2bin(string $data) : false|string;
function htmlentities(string $string, int $quote_style = unknown, string $encoding = unknown, bool $double_encode = unknown) : string;
function html_entity_decode(string $string, int $quote_style = unknown, string $encoding = unknown) : string;
function htmlspecialchars_decode(string $string, int $quote_style = unknown) : string;
function htmlspecialchars(string $string, int $quote_style = unknown, string $encoding = unknown, bool $double_encode = unknown) : string;
function http_build_query(array|object $querydata, string $prefix = unknown, string $arg_separator = unknown, int $enc_type = unknown) : string;
function iconv(string $in_charset, string $out_charset, string $str) : false|string;
function implode(string $glue, array $pieces) : string;
//function inet_ntop(string $in_addr) : false|string;
//function inet_pton(string $ip_address) : false|string;
// function ini_get(string $varname) : false|string;
function join(string $glue, array $pieces) : string;
function json_decode(string $json, bool $assoc = unknown, int $depth = unknown, int $options = unknown) : mixed;
function json_encode(mixed $data, int $options = unknown, int $depth = unknown) : false|string;
function key(array|object $array_arg) : int|null|string;
function lcfirst(string $str) : string;
// function long2ip(int|string $proper_address) : string;
// already done
function ltrim(string $str, string $character_mask = unknown) : string;
// max() also works on strings.
function max(array $arg1) : mixed;
function mb_convert_case(string $sourcestring, int $mode, string $encoding = unknown) : false|string;
function mb_convert_encoding(string $str, string $to_encoding, string|string[] $from_encoding = unknown) : false|string;
function mb_detect_encoding(string $str, mixed $encoding_list = unknown, bool $strict = unknown) : false|string;
function mb_strtolower(string $str, string $encoding = unknown) : false|string;
function mb_substr(string $str, int $start, ?int $length = null, string $encoding = unknown) : false|string;
// Probably unrealistically wrong if $raw_output = true and sent to a sink
// function md5_file(string $filename, bool $raw_output = unknown) : false|string;
// function md5(string $str, bool $raw_output = unknown) : string;

// metaphone filters out non-letters?
// function metaphone(string $text, int $phones = unknown) : false|string;
function min(array $arg1) : mixed;
function ngettext(string $msgid1, string $msgid2, int $n) : string;
function nl2br(string $str, bool $is_xhtml = unknown) : string;
// $key is probably secret from application
// function openssl_encrypt(string $data, string $method, string $key, int $options = unknown, string $iv = unknown, string &$tag = unknown, string $aad = unknown, int $tag_length = unknown) : false|string;
function pack(string $format, mixed ...$args) : string;
// function parse_ini_file(string $filename, bool $process_sections = unknown, int $scanner_mode = unknown) : array|false;
// depends on arguments
// function parse_url(string $url, int $url_component = unknown) : array{scheme?:string,host?:string,port?:int,user?:string,pass?:string,path?:string,query?:string,fragment?:string}|false|int|null|string;
// function pathinfo(string $path, int $options = unknown) : array|string;
// function php_uname(string $mode = unknown) : string;
// function phpversion(string $extension = unknown) : false|string;
function preg_filter(mixed $regex, mixed $replace, mixed $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_grep(string $regex, array $input, int $flags = unknown) : array;
function preg_quote(string $str, string $delim_char = unknown) : string;
function preg_replace_callback(array|string $regex, callable(array):string $callback, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_replace_callback_array(array<string,callable(array):string> $pattern, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_replace(array|string $regex, array|string $replace, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_split(string $pattern, string $subject, ?int $limit = null, int $flags = unknown) : list<string>;
function quoted_printable_decode(string $str) : string;
function quoted_printable_encode(string $str) : string;
function quotemeta(string $str) : string;
// function range(mixed $low, mixed $high, float|int $step = unknown) : array;
function rawurldecode(string $str) : string;
function rawurlencode(string $str) : string;
function readlink(string $filename) : false|string;
function realpath(string $path) : false|string;
// already done
function rtrim(string $str, string $character_mask = unknown) : string;
function serialize(mixed $variable) : string;
// depends on raw_output, but impractical
// function sha1(string $str, bool $raw_output = unknown) : string;
// function soundex(string $str) : string;
function sprintf(string $format, float|int|string ...$vars) : string;
// function stat(string $filename) : array|false;
function strchr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
// function stream_resolve_include_path(string $filename) : false|string;
function strftime(string $format, int $timestamp = unknown) : string;
function stripcslashes(string $str) : string;
function stripslashes(string $str) : string;
function strip_tags(string $str, string|string[] $allowable_tags = unknown) : string;
function str_ireplace(array|string $search, array|string $replace, array|string $subject, int &$replace_count = unknown) : string|string[];
function stristr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
function str_pad(string $input, int $pad_length, string $pad_string = unknown, int $pad_type = unknown) : string;
function strpbrk(string $haystack, string $char_list) : false|string;
function strrchr(string $haystack, int|string $needle) : false|string;
function str_repeat(string $input, int $multiplier) : string;
function str_replace(array|string $search, array|string $replace, array|string $subject, int &$replace_count = unknown) : string|string[];
function strrev(string $str) : string;
function str_rot13(string $str) : string;
function str_split(string $str, int $split_length = unknown) : list<string>;
// depends on $needle (and $haystack if $before_needle)
function strstr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
function strtolower(string $str) : string;
function strtoupper(string $str) : string;
function strtr(string $str, string $from, string $to) : string;
function strval(mixed $var) : string;
function str_word_count(string $string, int $format = unknown, string $charlist = unknown) : array<int,string>|int;
function substr_replace(string|string[] $str, mixed $repl, mixed $start, mixed $length = unknown) : string|string[];
function substr(string $str, int $start, int $length = unknown) : false|string;
// mostly html safe but can contain " and >?
function tempnam(string $dir, string $prefix) : false|string;
function token_get_all(string $source, int $flags = unknown) : list<array{0:int,1:string,2:int}>|list<string>;
function trim(string $str, string $character_mask = unknown) : string;
function ucfirst(string $str) : string;
function ucwords(string $str, string $delims = unknown) : string;
function uniqid(string $prefix = unknown, bool $more_entropy = unknown) : string;
function unpack(string $format, string $data, int $offset = unknown) : array|false;
function urldecode(string $str) : string;
// mostly safe
// function urlencode(string $str) : string;
function utf8_decode(string $data) : string;
function utf8_encode(string $data) : string;
function vsprintf(string $format, array $args) : string;
function wordwrap(string $str, int $width = unknown, string $break = unknown, bool $cut = unknown) : string;
function zlib_decode(string $data, int $max_decoded_len = unknown) : string;
function zlib_encode(string $data, int $encoding, int|string $level = unknown) : string;
TysonAndre commented 4 years ago

And then there's other helpers like UConverter->convert().

I wonder if fuzzing would help build a larger list ahead of time - e.g. in docker, instantiate classes, call methods to check for inputs that would emit < or " in the return value or result, and terminate abnormally, to cover rarer code such as UConverter::convert

TysonAndre commented 4 years ago

A second pass at adding functions to src/Psalm/Internal/Stubs/CoreGenericFunctions.phpstub based on the earlier snippet - This helps with join(), strval(), etc, and probably has some incorrect entries

Because of the missing type information, it may cause issues, and I'm not sure how psalm will handle the php 8.0 changes (e.g. dropping support for int $needle in string functions such as strpos).

It could be put into a plugin until those issues are worked out, though

/**
 * @psalm-pure
 * @psalm-flow ($message) -> return
 */
function _(string $message) : string {}
// prefer htmlentities/escapeshellarg()
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function addcslashes(string $str, string $charlist) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function addslashes(string $str) : string {}
// Taint checking probably won't be able to check if keys are tainted.
// /** @return associative-array<mixed, mixed> */
// function array_change_key_case(array $input, int $case = 0) : associative-array<mixed,mixed> {}
// function array_chunk(array $input, int $size, bool $preserve_keys = false) : list<array> {}
// function array_column(array $array, $column_key, $index_key = null) : array {}
// function array_combine(int[]|string[] $keys, array $values) : associative-array<mixed,mixed> {}
// function array_count_values(array $input) : associative-array<mixed,int> {}
// function array_diff_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_diff_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_diff(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_fill_keys(array $keys, $val) : array {}
// function array_fill(int $start_key, int $num, $val) : array<int,mixed> {}
// function array_filter(array $input, callable(mixed):bool|callable(mixed,mixed):bool $callback = null, int $flag = 0) : associative-array<mixed,mixed> {}
// function array_flip(array $input) : associative-array<mixed,int>|associative-array<mixed,string> {}
// function array_intersect_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_intersect_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_intersect(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_key_first(array $array) : int|null|string {}
// function array_key_last(array $array) : int|null|string {}
// function array_keys(array $input, $search_value = unknown, bool $strict = false) : list<int>|list<string> {}
// function array_map(?callable $callback, array $input1, array ...$args) : array {}
// function array_merge_recursive(array $arr1, array ...$args) : array {}
// function array_merge(array $arr1, array ...$args) : array {}
// function array_pad(array $input, int $pad_size, $pad_value) : array {}
// function array_rand(array $input, int $num_req) : array<int,int>|array<int,string>|int|string {}
// function array_reduce(array $input, callable(mixed,mixed):$callback, $initial = null) {}
// function array_replace_recursive(array $arr1, array $arr2, array ...$args) : array {}
// function array_replace(array $arr1, array $arr2, array ...$args) : array {}
// function array_reverse(array $input, bool $preserve = false) : array {}
// function array_search($needle, array $haystack, bool $strict = false) : false|int|string {}
// function array_slice(array $input, int $offset, ?int $length = null, bool $preserve_keys = false) : array {}
// function array_unique(array $input, int $sort_flags = 2) : associative-array<mixed,mixed> {}
// function array_values(array $input) : list<mixed> {}
/**
 * @psalm-pure
 *
 * @return string|false
 *
 * @psalm-flow ($str) -> return
 */
function base64_decode(string $str, bool $strict = false) {}
// function base64_encode(string $str) : string {}
// function base_convert(string $number, int $frombase, int $tobase) : string {}

/**
 * @psalm-pure
 * @psalm-flow ($path) -> return
 */
function basename(string $path, string $suffix = '') : string {}
// function bin2hex(string $data) : string {}
/**
 * @return int|string
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function bzcompress(string $source, int $blocksize100k = 4, int $workfactor = 0) {}
/**
 * @return int|string
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function bzdecompress(string $source, int $small = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function chop(string $str, string $character_mask = '

    ' . "\0" . '') : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str, $ending) -> return
 */
function chunk_split(string $str, int $chunklen = 76, string $ending = '
') : string {}
// function class_implements(object|string $what, bool $autoload = unknown) : array<string,class-string>|false {}
// function class_parents(object|string $instance, bool $autoload = unknown) : array<string,class-string>|false {}
/**
 * @psalm-pure
 * @psalm-flow ($var_name, $var_names) -> return
 */
function compact($var_name, ...$var_names) : array {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function convert_cyr_string(string $str, string $from, string $to) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function convert_uudecode(string $data) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function convert_uuencode(string $data) : string {}
// function count_chars(string $input, int $mode = unknown) : array<int,int>|false|string {}
/**
 * @param object|array $array_arg
 * @psalm-pure
 * @psalm-flow ($array_arg) -> return
 */
function current($array_arg) {}
/**
 * @psalm-pure
 * @psalm-flow ($path) -> return
 */
function dirname(string $path, int $levels = 1) : string {}
/**
 * @psalm-taint-specialize
 * @psalm-flow ($arr) -> return
 */
function each(array &$arr) : array {}
// eval safe?
/**
 * @psalm-pure
 * @psalm-flow ($arg) -> return
 * @psalm-taint-escape shell
 */
function escapeshellarg(string $arg) : string {}
// function fgetcsv(resource $fp, int $length = unknown, string $delimiter = unknown, string $enclosure = unknown, string $escape = unknown) : false|list<?string> {}
// function file(string $filename, int $flags = unknown, resource $context = unknown) : false|list<string> {}

// filter_input filter types depends on $type/$filter
// function filter_input_array(int $type, array|int $definition = unknown, bool $add_empty = unknown) : false|mixed {}
// function filter_input(int $type, string $variable_name, int $filter = unknown, array|int $options = unknown) : false|mixed {}
// function filter_var($variable, int $filter = unknown, $options = unknown) : false|mixed {}
// function filter_var_array
// function get_cfg_var(string $option_name) : array[]|false|string|string[] {}
// function get_class_methods($class) : list<string> {}
// function getenv(string $varname, bool $local_only = unknown) : false|string {}
// function getimagesize(string $imagefile, array &$info = unknown) : false|int[]|string[] {}
// function get_parent_class($object = unknown) : class-string|false {}
function gettext(string $msgid) : string {}
// function gettype($var) : string {}
// Can unescape $format with backslashes if user controlled
function gmdate(string $format, int $timestamp = null) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzcompress(string $data, int $level = -1, int $encoding = 15) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzdecode(string $data, int $length = 0) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzdeflate(string $data, int $level = -1, int $encoding = -15) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzencode(string $data, int $level = -1, int $encoding_mode = 31) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzinflate(string $data, int $length = 0) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzuncompress(string $data, int $length = 0) {}
// unsafe with $raw_output = true
// function hash_hmac(string $algo, string $data, string $key, bool $raw_output = unknown) : string {}
// function hash_pbkdf2(string $algo, string $password, string $salt, int $iterations, int $length = unknown, bool $raw_output = unknown) : string {}
// function hash(string $algo, string $data, bool $raw_output = unknown) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function hex2bin(string $data) {}
/**
 * @psalm-pure
 * @param array|object $querydata
 * @psalm-flow ($querydata) -> return
 */
function http_build_query($querydata, string $prefix = '', string $arg_separator = '', int $enc_type = 1) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function iconv(string $in_charset, string $out_charset, string $str) {}
//function inet_ntop(string $in_addr) {}
//function inet_pton(string $ip_address) {}
// function ini_get(string $varname) {}
/**
 * @psalm-pure
 * @psalm-flow ($glue, $pieces) -> return
 */
function join(string $glue, array $pieces) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 * TODO What taints does this unescape? (\uxxxx can quote)
 */
function json_decode(string $json, bool $assoc = null, int $depth = 512, int $options = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 * @psalm-taint-escape html
 * @return false|string
 */
function json_encode($data, int $options = 0, int $depth = 512) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function lcfirst(string $str) : string {}
// function long2ip(int|string $proper_address) : string {}
// already done
// max() also works on strings.
/**
 * @psalm-pure
 * @psalm-flow ($arg1) -> return
 */
function max(array $arg1) {}
/**
 * @psalm-pure
 * @psalm-flow ($sourcestring) -> return
 */
function mb_convert_case(string $sourcestring, int $mode, string $encoding = null) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_convert_encoding(string $str, string $to_encoding, $from_encoding = false) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_detect_encoding(string $str, $encoding_list = null, bool $strict = false) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_strtolower(string $str, string $encoding = null) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_substr(string $str, int $start, ?int $length = null, string $encoding = '') {}
// Probably unrealistically wrong if $raw_output = true and sent to a sink
// function md5_file(string $filename, bool $raw_output = unknown) {}
// function md5(string $str, bool $raw_output = unknown) : string {}

// metaphone filters out non-letters?
// function metaphone(string $text, int $phones = unknown) {}

/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($arg1) -> return
 */
function min(array $arg1) {}
/**
 * @psalm-pure
 * @return string
 * @psalm-flow ($str) -> return
 */
function ngettext(string $msgid1, string $msgid2, int $n) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function nl2br(string $str, bool $is_xhtml = false) : string {}
// $key is probably secret from application
// function openssl_encrypt(string $data, string $method, string $key, int $options = unknown, string $iv = unknown, string &$tag = unknown, string $aad = unknown, int $tag_length = unknown) {}
// function pack(string $format, mixed ...$args) : string {}
// function parse_ini_file(string $filename, bool $process_sections = unknown, int $scanner_mode = unknown) {}
// depends on arguments
// function parse_url(string $url, int $url_component = unknown) : array{scheme?:string,host?:string,port?:int,user?:string,pass?:string,path?:string,query?:string,fragment?:string}|false|int|null|string {}
// function pathinfo(string $path, int $options = unknown) {}
// function php_uname(string $mode = unknown) : string {}
// function phpversion(string $extension = unknown) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_filter($regex, $replace, $subject, int $limit = -1, int &$count = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_replace_callback_array(array $pattern, $subject, int $limit = -1, int &$count = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_split(string $pattern, string $subject, ?int $limit = -1, int $flags = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quoted_printable_decode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quoted_printable_encode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quotemeta(string $str) {}
// function range($low, $high, float|int $step = unknown) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-unescape html
 */
function rawurldecode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-escape html
 */
function rawurlencode(string $str) {}
// not pure
// function readlink(string $filename) {}
// already done
/**
 * @psalm-pure depending on definition
 * @psalm-flow ($variable) -> return
 */
function serialize($variable) {}
// depends on raw_output, but impractical
// function sha1(string $str, bool $raw_output = unknown) {}
// function soundex(string $str) {}
// function stat(string $filename) {}
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 * TODO support before_needle
 */
function strchr(string $haystack, $needle, bool $before_needle = false) {}
// function stream_resolve_include_path(string $filename) {}
/**
 * @psalm-pure
 * @psalm-flow ($format) -> return
 * Backslashes can be used for special characters
 */
function strftime(string $format, int $timestamp = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function stripcslashes(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function stripslashes(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($replace, $subject) -> return
 */
function str_ireplace($search, $replace, $subject, int &$replace_count = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 */
function stristr(string $haystack, $needle, bool $before_needle = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($input, $pad_string) -> return
 */
function str_pad(string $input, int $pad_length, string $pad_string = '', int $pad_type = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($haystack) -> return
 */
function strpbrk(string $haystack, string $char_list) {}
/**
 * @psalm-pure
 * @psalm-flow ($haystack, $needle) -> return
 */
function strrchr(string $haystack, $needle) {}
/**
 * @psalm-pure
 * @psalm-flow ($input) -> return
 */
function str_repeat(string $input, int $multiplier) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function strrev(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function str_rot13(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function str_split(string $str, int $split_length = 1) {}
// depends on $needle (and $haystack if $before_needle)
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 * TODO support before_needle=true
 */
function strstr(string $haystack, string $needle, bool $before_needle = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function strtr(string $str, string $from, string $to) {}
/**
 * @psalm-pure
 * @psalm-flow ($var) -> return
 */
function strval($var) {}
/**
 * @psalm-pure
 * @psalm-flow ($string) -> return
 */
function str_word_count(string $string, int $format = 0, string $charlist = '') {}
/**
 * @psalm-pure
 * @psalm-flow ($str, $repl) -> return
 */
function substr_replace($str, $repl, $start, $length = 0) {}
// mostly html safe but can contain " and >?
/**
 * @psalm-pure
 * @psalm-flow ($dir, $prefix) -> return
 */
function tempnam(string $dir, string $prefix) {}
/**
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function token_get_all(string $source, int $flags = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function ucfirst(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function ucwords(string $str, string $delims = '    
') {}
/**
 * @psalm-pure
 * @psalm-flow ($prefix) -> return
 */
function uniqid(string $prefix = '', bool $more_entropy = false) {}
/**
 * @psalm-pure
 * TODO
 */
function unpack(string $format, string $data, int $offset = 0) {}
/**
 * TODO: This also may add taints other than html?
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-unescape html
 */
function urldecode(string $str) {}
// mostly safe
// function urlencode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function utf8_decode(string $data) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function utf8_encode(string $data) {}
/**
 * @psalm-pure
 * @psalm-flow ($format, $args) -> return
 */
function vsprintf(string $format, array $args) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function wordwrap(string $str, int $width = 75, string $break = '
', bool $cut = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function zlib_decode(string $data, int $max_decoded_len = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function zlib_encode(string $data, int $encoding, $level = -1) {}
orklah commented 2 years ago

I believe all functions in callmap are assumed pure unless they're listed in an 'impure list' array somewhere. Is there still a point for that issue I missed?

TysonAndre commented 2 years ago

My original request was to add the @psalm-flow internal type stubs to indicate how taint flows from inputs to outputs of those functions which weren't there at the time.

https://github.com/danog/psalm/commit/4de2bf8f7fcb1cade5de1dd27cdc5073a761e56f and other associated commits did that for the most commonly used ones.

However, some remaining less common things such as echo utf8_decode($_GET['foo']); from the list in my comment aren't detected and don't have stubs like others in stubs/CoreGenericFunctions.phpstub. (using https://www.php.net/utf8_decode this way is obviously not an example of good code, but is an example of tainted code)

Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

https://psalm.dev/r/9b5d105c35 emits "No issues" but I'd expect a taint warning

psalm-github-bot[bot] commented 2 years ago

I found these snippets:

https://psalm.dev/r/9b5d105c35 ```php