php-pm / php-pm-httpkernel

HttpKernel adapter for use of Symfony and Laravel frameworks with PHP-PM
MIT License
246 stars 72 forks source link

Deleting unused variables after Response return #154

Closed mmoreram closed 5 years ago

mmoreram commented 5 years ago

Rationale

HttpKernel handle method is injected as an anonymous function, and because of that, internal memory positions used inside this function is only removed when the GC is applied. In order to avoid memory incremental, we can set to null these variables, and this used memory space will automaticaly be released.

How to check the impact

andig commented 5 years ago

Is there any proof that this works?

Imho releasing a variable does not return any memory unless gc actually runs. When the anonymous function returns, it will automatically release all local variables. That is- unless there are any other references in the system to those.

So imho this is not a cure but treating the symptoms.

If this PR really changes the behaviour we should analyse the released variable one by one and check which one triggers the growing memory consumption and then figure out where that is being referenced outside of the closure.

mmoreram commented 5 years ago

Setting a variable to NULL it releases the memory at the moment. Difference between =null and unset() as nicely @acasademont pointed to me https://stackoverflow.com/a/50931077

mmoreram commented 5 years ago

@andig it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions. You can proof that it works by just following the instructions I pointed to you in the PR description.

andig commented 5 years ago

Is there any proof that this works?

Could you provide before/after test case and measurements? Is this caused by any specific variable?

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

I would appreciate a link to the docs if possible.

mmoreram commented 5 years ago

@andig https://stackoverflow.com/a/50931077

I will post the measurements later

andig commented 5 years ago

ping @mmoreram did you get any proof that this changes the behaviour? Pleas note that the SO link is ages old and php has matured a lot since that time.

mmoreram commented 5 years ago

@andig You can check a comparision between having this improvements inside the kernel, and without it.

https://gist.github.com/mmoreram/d10b40e7745e80617dfc6753ef117016

The test is simple. An active active and a tests battery working against it. In each request / response iteration, a simple syslog print of the current used memory.

As you can see, without improvements, the memory grows. With improvements, the memory stays much more stable over the time :)

marcj commented 5 years ago

We should add a comment above those lines explaining that circumstances, so our future self will know what the f was/is wrong with PHP.

andig commented 5 years ago

I realize I'm a sceptic here as this is beyond what I've experienced with php. Are these tests comparable, i.e. why do they already start at different base memory consumption after the first request?

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions

What behavior is that? If nothing holds on to a reference it should be garbage-collected.

update are you potentially talking about https://bugs.php.net/bug.php?id=69639? Then gc_collect_cycles() should also help, though I don't see where we have a closure here that falls into that category. It still feels a little like voodoo for me.

mmoreram commented 5 years ago

@andig OK.

IMHO, forcing gc_collect_cycles() in each request iteration is not what I would say good practices, but is OK for me. You can ask me what behavior is a specific PHP one, and I would say to you that I don't really know. My expertise in PHP internals is not that good, but what I proposed to you guys is a demostration that by adding some small lines, the memory stays stable over the request iterations.

Now is up to you to accept it or decline, but I'm not going to enter a discussion like this.

Good luck.

update BTW. Both tests are exactly the same. Same tests battery, same code, just changing the lines I exposed.

andig commented 5 years ago

IMHO, forcing gc_collect_cycles() in each request iteration is not what I would say good practices, but is OK for me.

It isn't, full ack!

...but I'm not going to enter a discussion like this.

Not sure what you mean? You've specifically mentioned

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

and I'd appreciate any reference you have for this claim.

Update for reference: there is also a discussion in https://github.com/reactphp/http/issues/289

dnna commented 5 years ago

I know this PR isn't a cure and probably just treats the symptoms of a deeper issue (maybe in reactphp? maybe in PHP itself?), but if it does work and reduces long-term memory consumption shouldn't we merge it anyway? The changes don't seem very risky and we can always clean it up later if the root cause is discovered and it's no longer needed.

mmoreram commented 5 years ago

@andig when I say that I don't want to enter a discussion like this, I mean opinions discussions. I exposed some examples and demonstrations that these changes improve the behavior of the package. I really LOVE this project, and improving it is one of my project Apisearch goals.

Said that, in Apisearch we have implemented our own Bridge and Bootstrap. This is why I want the maximum performance and avoiding if/else blocks is a must for me. In my Bridge I added these "=null" lines, and my containers are now pretty stable in AWS, instead of being killed each 16 minutes.

@dnna I think that there's not a root cause. The cause itself is how PHP and GC works. Deleting variables inside anonymous functions in PHP have always been a MUST if these variables can have a big size (like a Request, or a Response objects). Deleting them is a good practice (has been since there isn't an Apache behing the project killing the PHP thread after each response return).

BTW, @andig, I can set my own code in the Bridge and this would improve the memory allocated in each of the slaves, but if I want to change the behavior in the server itself, I should fork the project in order to add my gc_collect_cycles, and this is something I want to completely avoid.

Thanks for the responses :)

andig commented 5 years ago

This is not about opinions. My question is about a claim you've made:

it has nothing to be related with referencies, but a behavior that PHP has with anonymous functions.

I'd appreciate any reference you have for this claim.

I'd simply appreciate an answer in order to better understand the root cause.

If there isn't one then fine, too, and let merge as-is.

mmoreram commented 5 years ago

@andig ups, I was not talking about this.

For example - https://bugs.php.net/bug.php?id=75914

Everyone tell that calling gc_collect_cycles fixes the problem, but for me, this is not a solution.

marcj commented 5 years ago

So, @mmoreram does alternatively calling gc_collect_cycle() fixes your issue as well? I'd think it should. Can you please test it in your use-case? I'd prefer that way over setting variables to null as it's very clear what that function call does. Also, I'd like to mention that we really can't talk about 'best practice' in any way here, since there's not much experiences in running PHP in such a way, so I don't see any argument that supports that claim that it isn't best practice. I mean, it's not best practice in a shared-nothing architecture, right, but that doesn't mean it's automatically the same in a completely different architecture/way of running an PHP application. Back in the days when I was coding a lot of PHP with long running processes like migration scripts, gc_collect_cycle() was actually a must to avoid tragic memory leaks.

mmoreram commented 5 years ago

@marcj never mind. My project is running as I was expecting now with my own Bridge, but seems that in PHP gc_collect_cycle() is the final solution for everything. Is OK for me :)

andig commented 5 years ago

@marcj must say that I prefer nulling over gc_collect_cycle as the latter has unpredictable runtime behaviour time-wise (assumption). I'd still merge this one but would prefer if we found where we are subject to https://bugs.php.net/bug.php?id=75914.

marcj commented 5 years ago

@andig, you're probably right :+1:

dnna commented 5 years ago

@andig @marcj I did some investigation after reading a bit more about this bug, and the leak seems to be in the block: $mapFiles = function (&$files) use (&$mapFiles) { ... }

Commenting out this part causes the memory to be stable in the same way like in @mmoreram 's example.

I'll investigate further to see what exactly is causing this leak and make a PR for it.

marcj commented 5 years ago

@dnna, very good find! 🤩

andig commented 5 years ago

GREAT! This looks exactly like the referenced bug. Closure is created like

$mapFiles = function (&$files) use (&$mapFiles) {...}

which will have access to $this and make it linger around. Would it help to create as static function instead since its not using $this?

dnna commented 5 years ago

I tried the static part but it still leaked. At the end I just declared it as a normal private function and that seems to solve the issue.

andig commented 5 years ago

Actually, its not because it isn't a classvar. Mhhhm...

andig commented 5 years ago

I tried the static part but it still leaked. At the end I just declared it as a normal private function and that seems to solve the issue.

Funny :O

/cc @kelunik maybe you could shed some light on this?

dnna commented 5 years ago

Could it be because it uses $mapFiles internally, so it created a new circular reference on each request as the function is redeclared every time?

kelunik commented 5 years ago

$mapFiles has a circular reference to itself here. Why isn't that a private method?

andig commented 5 years ago

Big thank you- finally an explanation. My world has just shifted back into order where laws of physics apply. 🌎

It is a private method thanks to @dnna now. No idea why the ever was different...