pberkes / big_O

Python module to estimate big-O time complexity from execution time
BSD 3-Clause "New" or "Revised" License
322 stars 50 forks source link

Output time complexity changes in different calls #37

Closed fcrperes closed 3 years ago

fcrperes commented 3 years ago

Hey everyone,

So I came across the big_o library and I found it super useful and very cool. I was doing some tests, trying to understand how it works and I came across the following problem using the same code snippets as can be found in https://pypi.org/project/big-O/:

When I write: big_o.big_o(np.zeros, big_o.datagen.n_, max_n=100000, n_repeats=100)

I get all sorts of outputs: Linear, Quadratic, Linearithmic, Cubic, ... . This is very disturbing because it looks like the result is not very stable and it might be right but might also be wrong.

I noticed this is particularly true if have already assessed the time complexity of some other method. That is, it looks like previous calls to "big_o" make subsequent calls less stable. I don't think this makes any sense but it was a pattern I have noticed (though it might have been a coincidence).

Has anyone experienced similar variations of the result?

See this snippet as an example of getting wrong results:

image

pberkes commented 3 years ago

Thanks for the report!

The measurement of timing are bound to be noisy, in part because your machine is going to be running other tasks at the same time, in part because the architecture of your machine and OS adds several layers of complexities: multiple layers of hardware caching, memory page allocation and deletion, and many other optimizations.

I think the np.zeros and np.empty examples I added to the documentation are not so good for two reasons: 1) the functions complete so fast that any other process running at the same time will mess up the measurements; 2) they depend on memory allocation, which is something that the OS heavily optimizes, and so the timing as a function of size will show discontinuities that are due the optimization rather than the function itself.

In the vast majority of cases, these issues are negligible if you are using big_O to test a function that takes more than trivial time to run.

Occasionally, though, what in these examples is an annoyance might reveal an unwanted interaction between the algorithm and the OS or hardware. There is in fact a recent branch of computer science that designs algorithms in a way that takes into account modern computer architectures, rather than ignoring it.

pberkes commented 3 years ago

I'm going to close this issue and open a new one for better examples in the documentation, and a section about the potential issues discussed above.

pberkes commented 3 years ago

Francesc Alted made some interesting presentations on the subject, see e.g. https://github.com/FrancescAlted/MemoryBoundComputations .