laravel / framework

The Laravel Framework.
https://laravel.com
MIT License
32.33k stars 10.96k forks source link

Memory Leak on big Seed operations #22797

Closed mycarrysun closed 6 years ago

mycarrysun commented 6 years ago

Description:

When importing a large data set via a seeder, memory usage always increases and never drops even when using chunking, disabling query logging and manually unsetting vars. I have everything running through functions/closures to clear memory/vars as well.

Steps To Reproduce:

Create a seeder that runs several child processes and create a progress bar with format "debug". When a seeder is done running and another one starts, you would expect the memory usage from the last seeder to be clear but you will see in the progress bar that the memory does not drop after each seeder. I've raised my memory limit to 512MB but I don't want that to be my solution here.

sisve commented 6 years ago

Can you build a working case to reproduce this that you can share? You can probably use the php generators and yield large amounts of data repeatedly instead of having large data sets.

mycarrysun commented 6 years ago

Yes I will work on this in order to reproduce -- this is not a use case for faker generators because I am using seed functions to migrate data from an old version of our system which has very large data sets. The most efficient way I have found to do that is to store the old data in another database as a copy, and I am writing seeders in order to copy everything correctly because it is not an exact copy/paste since there is DB Schema changes that need to be handled.

I realize this is probably not what seeders were meant for but there still should not be a memory leak if everything is encapsulated properly in functions where PHP should release the memory.

sisve commented 6 years ago

I'm not taking about faker generators, but the language feature. A simple example; https://3v4l.org/udf4S. The idea is that if it's just enough with large amount of data, then a generator that produces a million large entries should be able to reproduce the problem.

mycarrysun commented 6 years ago

Gotcha - thanks for the tip! Will post a link to the working example once I get it up and running...any suggestions on tools to use for a simple use case like this rather than spinning up a whole new app instance?

sisve commented 6 years ago

I don't think we should attempt to do any odd things. We should use a real seeder as they are used in the framework, the point about the generator is just to generate large amount of data without having to bundle a huge json with the issue.

It could be that the issue is with reading the source data, and I guess we'll find out if that is the case if you cannot reproduce it when the loading is replaced with the generator.

mycarrysun commented 6 years ago

I am using the DB facade and using the chunk method to loop through results...which is supposed to take care of any memory issues if I'm not mistaken.

Will get a small app spun up for example sometime soon!

Kyslik commented 6 years ago

@mycarrysun can you verify memory usage of PHP using other methods than "progress bar with format "debug""

https://stackoverflow.com/a/20277787/1564365

mycarrysun commented 6 years ago

memory_get_usage(false) Is this function suitable? It is returning the same amounts as the progress bar i.e. it is not decreasing after a seeder has stopped and the next one starts.

mfn commented 6 years ago

using the chunk method to loop through results

It should be enough to tame the memory issue.

However note that it can become very slow on the DB side because it's not a real streaming/chunking implementation, but it's using seek/offset based pagination which is known to scale very bad.

mycarrysun commented 6 years ago

Alright everyone...after finally making the move to production - for some reason there was no memory leaking in my scripts in the production environment. Consistent memory usage around 18M.

If I find out what it was, I'll post back here but for now looks like a non-issue and a bug in my own code most-likely.

mycarrysun commented 6 years ago

Found out what was causing this!!

In case anyone else is banging their head against the wall...disable any debuggers that are being used in Laravel. I had been using a debugger package Lanin\Laravel\ApiDebugger and when I disabled it, the memory was consistently around 16M.