phpstan / phpstan-symfony

Symfony extension for PHPStan
MIT License
709 stars 89 forks source link

Result cache is not invalidated after DI container dump #255

Open Wirone opened 2 years ago

Wirone commented 2 years ago

Imagine scenario:

Expected result: refreshing DI container invalidates result cache for files that uses DI (have errors reported by Symfony extension).

Wirone commented 1 month ago

Yesterday we had production incident because of this issue. Here's what happened:

This is not how I would expect PHPStan to work and take care of quality of our app. Please @ondrejmirtes consider looking at this, as this problem affects one of your sponsors (GetResponse) 🙂🙏.

ondrejmirtes commented 1 month ago

Would this problem be solved by adding the hash of the containerXmlPath file to the result cache so that it gets invalidated when it changes?

Wirone commented 1 month ago

I am not sure, but this is something I was thinking about too when I considered workaround on our side - to store some hash from DI-related files (did not consider containerXmlPath though, but it looks much better as it's the final result of the cache compilation). However, I am not sure whether XML file is always generated in the same way for the same set of services (moving entries representing the same content), maybe it needs some normalisations (sort by service name?). FYI in our case XML is almost 4MB.

ondrejmirtes commented 1 month ago

I don't know enough about this problem space to solve it myself.

ruudk commented 1 month ago

However, I am not sure whether XML file is always generated in the same way for the same set of services (moving entries representing the same content), maybe it needs some normalisations (sort by service name?). FYI in our case XML is almost 4MB.

The XML only regenerates when the container is compiled. So it should not change when the container does not change.

ruudk commented 1 month ago

Would this problem be solved by adding the hash of the containerXmlPath file to the result cache so that it gets invalidated when it changes?

Is there an easy way to attach the hash of that file to the PHPStan cache?

Wirone commented 1 month ago

The XML only regenerates when the container is compiled. So it should not change when the container does not change.

I know, I am just wondering if it's always in the same order or compiling may lead to

<?xml version="1.0" encoding="utf-8"?>
<container xmlns="http://symfony.com/schema/dic/services" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://symfony.com/schema/dic/services https://symfony.com/schema/dic/services/services-1.0.xsd">
  <parameters><!-- ... --></parameters>
  <services>
    <service id="Foo" class="App\Foo" public="true"/>
    <service id="Bar" class="App\Bar" public="true"/>
  </services>
</container>

vs

<?xml version="1.0" encoding="utf-8"?>
<container xmlns="http://symfony.com/schema/dic/services" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://symfony.com/schema/dic/services https://symfony.com/schema/dic/services/services-1.0.xsd">
  <parameters><!-- ... --></parameters>
  <services>
    <service id="Bar" class="App\Bar" public="true"/>
    <service id="Foo" class="App\Foo" public="true"/>
  </services>
</container>

which may result in different hash, but technically represents exactly the same DI container. @nicolas-grekas sorry for pinging, but you may probably know something and shed some light 🙂.

Anyway, ideal scenario would be to invalidate PHPStan's result cache if DI-related files were changed and even if cache:clear wasn't executed, but I believe it's pretty hard as container's definition is merged from many different files (YAML/XML/PHP configs, PHP attributes in actual code etc). So the MVP would be an invalidation based on dumped XML container, and then it's easy to enforce rebuilding cache before running PHPStan (we have phpstan:check Composer script, it can be transparently converted into a group of commands).

ondrejmirtes commented 1 month ago

@ruudk We'd need a new interface and container tag, something like ResultCacheMetaExtension. It'd have two methods getKey() and getHash().

We'd call these extensions from https://github.com/phpstan/phpstan-src/blob/af6a9c5f1e34bb11e2b5c9f85ba46ca318bca426/src/Analyser/ResultCache/ResultCacheManager.php#L877-L918.

It's doable on 1.12.x

ruudk commented 1 month ago

I know, I am just wondering if it's always in the same order or compiling may lead to

Does it matter? If the order in the XML was changed, the container was compiled again. In that case, PHPStan should bust the cache.

But I would expect Symfony to normalize the services when building the container, so that it's predictable.

ComiR commented 1 month ago

Would it be possible to only invalidate the cache if relevant parts of the file have changed? I.e. invalidate the cache only if the services section changed. Changes in other sections (e.g. parameters) wouldn't affect the result (at least as of now, rules added in the future might change this).

If it would complicate this feature too much, keep it simple and only use the file hash. Better to over-invalidate here imho (shouldn't happen that often anyway).

ruudk commented 1 month ago

Would it be possible to only invalidate the cache if relevant parts of the file have changed?

No. And if, this should be a task for Symfony DI component. It currently is very quickly to say "let's recompile" when any of the resources changed (file m time).

Wirone commented 1 month ago

Does it matter? If the order in the XML was changed, the container was compiled again. In that case, PHPStan should bust the cache.

@ruudk It does matter if your analysis takes half an hour for each MR when result cache from main branch isn't reused because of false negative hash check 😅. We have large runners for main branch which saves result cache in Gitlab's cache, for merge requests there is a PHPStan job on regular runner (less CPUs) that only reads cache, so it reuses PHPStan's result cache when possible, but does not write new one. If DI container's content would invalidate the result cache too often (when there are not any actual changes), it would waste much of our CI resources.

But I would expect Symfony to normalize the services when building the container, so that it's predictable.

Yes, that would be the best.

Would it be possible to only invalidate the cache if relevant parts of the file have changed? I.e. invalidate the cache only if the services section changed. Changes in other sections (e.g. parameters) wouldn't affect the result (at least as of now, rules added in the future might change this).

@ComiR I believe it's not a good idea, because fetching undefined param in your code also can lead to an error. IMHO invalidation should happen if there's any actual difference in container's definition, which means:

ruudk commented 1 month ago

It does matter, ... , it would waste much of our CI resources

Yes, true. I wouldn't be happy with that either on CI.

I looked at the container XML and it seems that this thing is already immutable. The XmlDumper does not do any sorting. It dumps the definitions as how they were added to the container builder. And it must be that Symfony always uses the same order of loading files / config / autowire, otherwise we would have seen many other bugs already. So that means that the output of the XML is always as it should be. Dynamic things like build times / hashes are not included in the XML, which is good.

I compared the XML before and after doing a cache:clear, and it's identical.

ComiR commented 1 month ago

@ComiR I believe it's not a good idea, because fetching undefined param in your code also can lead to an error. IMHO invalidation should happen if there's any actual difference in container's definition, which means:

  • added/removed params/services (rename is technically a deletion+addition)
  • changed services' arguments

@Wirone The rules only check for private or missing services. But I forgot about the extensions that e.g. provide the return types for ContainerInterface::getParameter(), those are using the parameters section of course.

So I don't know if there would be any benefit in checking the file for relevant changes or if it would suffice to simply use the file hash. You'd first have to find out which changes in the file would be irrelevant (e.g. the order of services, maybe the actual values of parameters, ...?) and then ignoring those.

I guess it would be best to split this. For now invalidate caches when the file hash changes to fix the caches never being invalidated (seems like a bug to me) and later add a feature that will improve caching by checking for relevant changes only.

Wirone commented 1 month ago

I compared the XML before and after doing a cache:clear, and it's identical.

I did quick experiment too and generated 5 XML files in our app:

1st, 2nd and 3rd XMLs were exactly the same. 4th had one additional service in random place, 5th had exactly the same <service> entry, but in different place. Which means calculating hash from raw XML file may lead to false positives when it comes to result cache invalidation, because the same container can be represented in different ways in XML. It may happen if you refactor your codebase and extract service definitions to dedicated file or move things around (e.g. in modular monolith with configurations spread across modules). For those cases PHPStan need to be smarter unless Symfony ensures immutability on its side.

I guess it would be best to split this. For now invalidate caches when the file hash changes to fix the caches never being invalidated (seems like a bug to me) and later add a feature that will improve caching by checking for relevant changes only.

If I had to choose, I would of course prefer excessive invalidation instead of not invalidating at all. It's better to wait longer for the pipeline result but catch the problems early. But IMHO it can't be final implementation, as in our case it really matters when invalidation happens.

ruudk commented 1 month ago
  • added new service into file A
  • added exactly the same into file B (and removed from file A)

So when this happens, the container gets compiled again and the XML is different. I expect that. What is the problem with this?

Wirone commented 1 month ago

So when this happens, the container gets compiled again and the XML is different. I expect that. What is the problem with this?

Technically DI definition is exactly the same - set of services is equal (see here). I believe result cache should NOT be invalidated in that case, as all the ContainerInterface::get() calls would work exactly the same. It should be invalidated only when some services were added or removed, so the existing calls may be invalid.

ruudk commented 1 month ago

Why make this so complicated for no reason for such an edge case. You even say on X that one should not directly call services on the DI. A pragmatic solution is to bust the cache when the XML file changes. Now we are talking about normalizing the XML file for edge-edge cases.

Wirone commented 1 month ago

I tweeted that you shouldn't fetch services from DI container because that's the truth, at least in theory. In practice we have 25-years old, 2M SLOC app without dependency injection for controllers (legacy stuff) and gazillions of ContainerInterface::get() usages directly in the code. For our case, it's not an edge case, it's reality, and that's why I am advocating for a smarter way of invalidating the result cache, because otherwise most probably we would invalidate it too often.

But as I said, MVP would be invalidating on any change, then we can observe how it goes.