Closed mattgerg12 closed 1 year ago
--disable-oom-detection
and see if that helps. Probably OOM SVG should suggest that.1+1
case, it's probably just so little memory it doesn't show up (0 bytes!). 1
and 2
are pre-allocated in Python by default. So that's a bug I should fix, it should at least indicate that.memory_profiler
can give you the same-ish info, it's just... it takes a lot more time because it only gives you line-by-line info.Try with --disable-oom-detection
and see if that helps.
Oh, except the OOM detection disabling thing is not currently available in Jupyter. I'll try to fix that (or just disable it by default, we'll see).
Thanks for getting back. I feel the 1+1 situation is not something that is immediate, as there might not be a need to profile something like that.( but yeah worth putting some warning).
I feel the use of fil with jupyter is good to have especially for data scientists, and the graph that fil prints out is easily interpretable by data scientists; hence I think having this option disabled in jupyter will help us out.
Memory_profiler
can give the peak and increment memory for an entire cell in jupyter when used with magics %%memit
. Our data scientists use it a lot. But thought of the move to fil because of the more information and the intuitive features.
I am now kinda confused with OOM. Considering my laptop is of RAM 16 GB
If my dataset uses around 23 GB from Memory_usage
(deep true), then why am I not getting OOM when loading the dataset, and the memory_profile
gives a peak around 6 GB?
Is it the case that 23 GB of virtual memory ( allocated memory ) is used and around 17GB is swapped to the disk and around 6 GB is in RAM? If this is the case, what is the maximum amount of data that can be handled on my machine using pandas? Is it the free hard disk space?
So, yes, one thing to keep in mind is that memory_profiler
and Fil measure two different things: peak resident memory, i.e. in RAM for the former, and how much memory you requested for the latter. https://pythonspeed.com/articles/measuring-memory-python/ has write up with more details.
I'm told that macOS will keep writing to disk until it has twice as much disk usage as memory, and only then start failing memory allocations/killing your process. So in your case 32GB on disk. But that would be pretty slow.
The out-of-memory detection Fil does, and which I guess I should consider just disabling on macOS, uses heuristics to guess when OOM is approaching, and sometimes it triggers too soon.
You can try Fil with most other applications closed, or just a couple of browser tabs, and see if that gets you further.
Hopefully I'll have a release with OOM disabled in an hour or two, or tomorrow if tests fail.
As another option, I also work on a commercial Python profiler for data science: https://sciagraph.com
Pros compared to Fil:
Cons compared to Fil:
Oh, and re 1+1
, it's quite possible it literally allocated no memory: numbers below a certain size are pre-allocated objects in Python, e.g. here you can see that if you create two numbers that are large, they are different addresses in memory, but 1
is always the same address (as an implementation detail, don't rely on this in code...)
>>> x = 1_000_000
>>> y = 1_000_000
>>> x == y
True
>>> x is y
False
>>> x = 1
>>> y = 1
>>> x is y
True
@itamarst Thanks for fixing both OOM and 1+1 issue. I am able to workout fil in jupyter.
I noticed it works fine and the graph shows up in jupyter when profiling small tasks. But I noticed when working with big datasets; for eg in my case I am loading 10 GB CSV file the graph is not showing even though it is getting generated like below
I checked those folders and the file exist and it is using peak 53745 MB. Is there any reason why the graph is not showing in my jupyter notebook ?
Also the above number (53745 MB) is allocated memory right ? Where I can find peak resident memory in fil (at least it is not showing in the above graph - it only shows peak tracked memory usage)?
https://sciagraph.com/ looks interesting can I track any tickets to see when Mac support becomes available ?
I was reading this https://pythonspeed.com/articles/python-out-of-memory/ Does fil profiler also gives how it failed like mentioned in the article ?
Of course, segfaults happen for other reasons as well, so to figure out the cause you’ll need to inspect the core file with a debugger like gdb, or run the program under the Fil memory profiler.
In the above case loading 10 GB file succeed with no issues. If we load that same 10 GB file again to a different variable that too works with no issues (no memory issues). I was just curious on why it is not failing and checked if when loading 10GB second time is using same address space like the first one (similar to simple copy case) - but it is using different address space.
But when we load a 20 GB file (this is csv file basically made by row binding this 10GB file twice) jupyter kernel crashes. Would be great if we can get the reason of the fail like mentioned in the article.
I am doing some benchmarking and thought about using fil.
I am loading a csv file of size 3 GB to my 16 GB RAM mac. I am able to load the file completely fine in python3 with Fil kernel, but when I use %%filprofile The kernel is dying; and upon checking logs I saw out-of-memory.svg file in some temp folder with memory showing around 6000 MB. This shouldn’t be the case as I can load the data completely fine without fil and other profilers work with no issues. (like below)
I am wondering why this is happening when I am trying to use %%filprofile. All other profilers works fine like I tried memory profiler and it is showing as follows.
I don’t get any issue when I tested fil by profiling 1+1
Is fil designed to profile small dataset cases ? What can I try from my end to make fil working ?
Just as a side" note how different is fil when compared to memoryprofiler as It also give the peak memory and increment. Are there any strong reasons that fil is better other than the nice graphical view ? I do understand why it is better compared to sys.getsizeof() and memory_usage() from your article
filprofiler==2023.1.0 Python 3.11.0