staabm / phpstan-dba

PHPStan based SQL static analysis and type inference for the database access layer
https://staabm.github.io/archive.html#phpstan-dba
MIT License
250 stars 17 forks source link

Fix sql ast type inference on raw-sql expression without aliases #582

Closed jakubvojacek closed 1 year ago

jakubvojacek commented 1 year ago

I tried to revive the old branch in order to get back to the solution that worked (as per https://github.com/staabm/phpstan-dba/pull/580#issuecomment-1475130801), lets see whether the tests go through

hemberger commented 1 year ago

Hey @jakubvojacek, thanks for working on this! I'm looking forward to this fix, as it's a big factor holding me back from enabling SQL AST.

One thing I noticed is that a lot of the existing tests assert the wrong types due to this bug, so we don't actually want them to pass without being modified (to the best of my knowledge). I've documented some of the examples in https://github.com/staabm/phpstan-dba/issues/578.

My apologies if I've commented before you were done with the PR!

jakubvojacek commented 1 year ago

Hello @hemberger, I believe this PR is ready.

Which of the tests that are now passing shouldn't be?

I checked with the sample

$stmt = $pdo->query('SELECT avg(freigabe1u1) from ada');

and the expected output in this PR would be PDOStatement<array{avg(freigabe1u1): int<-128, 127>|null, 0: int<-128, 127>|null}> which is correct, right?

Thanks

hemberger commented 1 year ago

and the expected output in this PR would be PDOStatement<array{avg(freigabe1u1): int<-128, 127>|null, 0: int<-128, 127>|null}> which is correct, right?

Since avg is not restricted to integer output, the expected type here, to the best of my knowledge, should be:

PDOStatement<array{avg(freigabe1u1): numeric-string|null, 0: numeric-string|null}>

Looking at the max test in its current form:

stmt = $pdo->query('SELECT max(freigabe1u1) as max from ada');
assertType('PDOStatement<array{max: int<-128, 127>|null, 0: int<-128, 127>|null}>', $stmt);

The expected type should be:

PDOStatement<array{max: int<-32768, 32767>|null, 0: int<-32768, 32767>|null}>

because the freigabe1u1 column is a SMALLINT, not a TINYINT.

Which of the tests that are now passing shouldn't be?

I think many of the aggregate functions are still wrong (e.g. avg, sum, min, max). Maybe that means there is a bug somewhere outside this PR, but something somewhere seems not quite right.

I wonder if it would be helpful to test the queries with and without the aliases, as well?

Thanks again for your work here. Can't wait to use it! :)

jakubvojacek commented 1 year ago

Maybe that means there is a bug somewhere outside this PR, but something somewhere seems not quite right.

that is most likely it.

This PR only fixed that a column with an alias has same output as without it, eg

-assertType('PDOStatement<array{myemail: int<0, max>|null, 0: int<0, max>|null, count(email): int, 1: int<0, max>|null}>', $stmt);
+assertType('PDOStatement<array{myemail: int<0, max>|null, 0: int<0, max>|null, count(email): int<0, max>|null, 1: int<0, max>|null}>', $stmt);

After this PR, count(email) is same that as key 1

But you're right it seems, freigabe1u1 is a small signed int and the range is incorrect there, right @staabm ?

But that should be addressed by another PR I guess?

jakubvojacek commented 1 year ago

@staabm @hemberger I narrowed the problem down to MysqlTypeMapper (https://github.com/staabm/phpstan-dba/blob/main/src/TypeMapping/MysqlTypeMapper.php#L96)

It resolves the range based on display width (length) which isnt correct apparently. It should be done based on native_type? Below is a sample of the ada table. Columns adaid and freigabe1u1 have same antive_type but different display width. We should disregard the display width completely.

^ array:4 [
  0 => array:7 [
    "native_type" => "SHORT"
    "pdo_type" => 1
    "flags" => array:3 [
      0 => "not_null"
      1 => "primary_key"
      2 => "NUM"
    ]
    "table" => "ada"
    "name" => "adaid"
    "len" => 6
    "precision" => 0
  ]
  1 => array:7 [
    "native_type" => "TINY"
    "pdo_type" => 1
    "flags" => array:2 [
      0 => "not_null"
      1 => "NUM"
    ]
    "table" => "ada"
    "name" => "gesperrt"
    "len" => 1
    "precision" => 0
  ]
  2 => array:7 [
    "native_type" => "VAR_STRING"
    "pdo_type" => 2
    "flags" => array:1 [
      0 => "not_null"
    ]
    "table" => "ada"
    "name" => "email"
    "len" => 400
    "precision" => 0
  ]
  3 => array:7 [
    "native_type" => "SHORT"
    "pdo_type" => 1
    "flags" => array:2 [
      0 => "not_null"
      1 => "NUM"
    ]
    "table" => "ada"
    "name" => "freigabe1u1"
    "len" => 1
    "precision" => 0
  ]
]

https://dev.mysql.com/doc/refman/8.0/en/numeric-type-attributes.html

The display width does not constrain the range of values that can be stored in the column.

staabm commented 1 year ago

It resolves the range based on display width (length) which isnt correct apparently.

Please open a new issue with the posted findings

We should only fix 1 problem with this PR

jakubvojacek commented 1 year ago

We should only fix 1 problem with this PR

yes of course, I can try & send a new PR for the mapper

As for this PR, is it ready or you still have some doubts?

staabm commented 1 year ago

thank you