smarr / are-we-fast-yet

Are We Fast Yet? Comparing Language Implementations with Objects, Closures, and Arrays
Other
318 stars 36 forks source link

Port AWFY Completely to Python 3 #50

Closed smarr closed 2 years ago

smarr commented 4 years ago

This PR is an early work in progress.

The benchmarks were ported to Python by @raphaelvigee as part of his Risotto project.

More work needed, but a good start.

Makes progress on #4.

Status

Macro Benchmarks

Micro Benchmarks

tekknolagi commented 3 years ago

There are also the pyperformance benchmarks, which include richards, deltablue, fannkuch, nbody, etc

smarr commented 3 years ago

Thanks. Though, last time I checked, there are too many differences to simply adapt them. It seems easier to translate from what we have already to ensure the results allow me to draw conclusions from a comparison.

It's quite mechanic work. Just takes a bit of doing.

eregon commented 3 years ago

I added Queens in https://github.com/eregon/are-we-fast-yet/commit/17e8f37681529c7790f8675358c98ff40f464f2e Translated mostly from the Ruby version since it's quite similar.

tekknolagi commented 3 years ago

https://github.com/python/pyperformance/blob/main/pyperformance/benchmarks/bm_nqueens.py is pretty nice

eregon commented 3 years ago

@tekknolagi The point for are-we-fast-yet is benchmarks in different languages must match as much as possible so we can compare their performance as fair as possible. So making the benchmark code nicer or smarter is not a goal for are-we-fast-yet.

smarr commented 3 years ago

@tekknolagi https://github.com/smarr/are-we-fast-yet#goal and https://github.com/smarr/are-we-fast-yet/blob/master/docs/guidelines.md hopefully give a bit of background and and idea of how Are We Fast Yet is different from other benchmarking "games".

There's also the long form paper here: https://stefan-marr.de/papers/dls-marr-et-al-cross-language-compiler-benchmarking-are-we-fast-yet/

smarr commented 3 years ago

@eregon Thanks, I added your Queens to the PR

smarr commented 3 years ago

@cfbolz @timfel could I ask you guys to have a look at a few things for the AreWeFastYet Python 3 port?

A general code review is of course welcome. Though, there are a few specific questions that I would like a comment on, too.

Just for context, key goals of these specific benchmarks are that they are as comparable as possible between different languages. Thus, they rely only on a widely available subset of the language, following a set of guidelines. This language subset is an OO core that's portable to other languages. But, it would be nice to be as idiomatic/Pythonic in this subset as possible.

I am done porting the micro benchmarks, and halfway through the macro benchmarks.

When working on those I noticed the following things:

I guess, those are the bits I noticed so far, but there are probably other bits here and there, too. Any comment, suggestion, opinion welcome.

timfel commented 3 years ago

What's the best way to port a Java pointer comparison?

Use is. It's quite commonly used. == is a call to __eq__; if x: is common, but is a call to __bool__.

when writing variables in an outer scope, what's the Pythonic way

Honestly, that entire file looks completely un-pythonic to me, so I wouldn't worry about details :D

smarr commented 3 years ago

Honestly, that entire file looks completely un-pythonic to me, so I wouldn't worry about details :D

Yeah, well, I don't know what pythonic looks like, so... and the other tradeoff is comparability of course :)

smarr commented 3 years ago

The Python 3 way to do this is "nonlocal":

thanks, fixed.

timfel commented 3 years ago

Honestly, that entire file looks completely un-pythonic to me, so I wouldn't worry about details :D

Yeah, well, I don't know what pythonic looks like, so... and the other tradeoff is comparability of course :)

What I mean is just - the entire endeavor to write a vector class seems pointless on Python, and I doubt it can be called comparable when really you're layering this implementation over the internal vector (list), rather than a fixed size array like in other languages.

smarr commented 3 years ago

What I mean is just - the entire endeavor to write a vector class seems pointless on Python, and I doubt it can be called comparable when really you're layering this implementation over the internal vector (list), rather than a fixed size array like in other languages.

Yes, that's the general issue with cross-language comparisons. Python isn't the only language with this issue. JavaScript and Ruby are in the same boat. Their arrays aren't fixed sized either.

Though, as long as Python doesn't have something other than the list builtin I could use, I don't see how to avoid the issue. Not using the same collection (not just Vector, but also Dictionary, and Set) is simply not an option if I want to ensure comparability.

I agree, one can argue whether to use built in collections instead of a custom collection library, but in the end when it comes to comparing the compiler effectiveness, the only way to actually compare that aspect is by keeping the code as similar as possible.

One could have other possible goals of course, especially when solely comparing between Python implementations. For that case, it may make sense to have a variant that uses lists directly. Though, that's perhaps something for later.

smarr commented 3 years ago

What's the Pythonic way with getters and setters? Looking at https://stackoverflow.com/a/36943813 it makes me believe, I should avoid using them.

The reason I am asking is that the languages differ quite substantially in the handling of those. Smalltalk and Ruby do need them, since fields are always private. Though, Ruby has things like attr_accessor to generate them, and presumably optimizes them. In Java, getters and setters are idiomatic, and optimized, too.

In my Python port, I did generally use getters and setters for consistency, and for what I am most interested in (jit-compiled performance) I would expect them to be properly optimized, and if they are not, that seems worth detecting with a benchmark. However, now with Python in the mix, comparability of interpreter performance seems more desirable, and reading up on things, there may be a good case for Python of not using getters and setters. It seems @property with getter, setter, and deleter options might be the "more recommended" approach if one would want to modify behavior.

Looking at the simply List benchmark, using getters causes a slow down of more than a factor 2 on CPython. As one would hope, it doesn't make a difference for PyPy though.

Do you guys have any opinions on the issue?

I have not systematically thought about these benchmarks for comparing interpreters. There's something to be said about exposing each implementation to the same challenges, and optimizing getters/setters may be a useful thing to do. Though, if the language considers getters/setters a bad thing, it seems unproductive to ask language implementers to optimize them. Hmm.....

cfbolz commented 3 years ago

"trivial" getters and setters are typically not used in Python, because it's always possible to add logic to attribute reads/writes later, if needed (either via property or some other means). So having them would make the code be less representative or "normal" python code.

But yes, I don't think there are good answers to the philosophical questions here...

eregon commented 3 years ago

IMHO it seems better without (explicit) getters/setters for Python, because Python already has so much flexibility built-in when accessing fields/properties. It feels a bit like the getters/setters are somehow always built-in for every field (but if the field starts with _ then one shouldn't use those by convention). So I'd say no explicit getters/setters for Python is more idiomatic, and already represents challenges to the VM as they need various check for field/property accesses due to Python semantics.

smarr commented 3 years ago

Yeah, things like __setattr__ and seem good reasons to just go with plain fields. Thanks.

smarr commented 3 years ago

As part of this port, I started a AWFY variant that is using only builtin collections. Code here for JavaScript, Ruby, and Python: https://github.com/smarr/are-we-fast-yet/tree/awfy-dynamic-languages

Interestingly, it's triggering a few issues.

@timfel first one might be interesting to you.

Using plain lists, and clearing them here: https://github.com/smarr/are-we-fast-yet/blob/awfy-dynamic-languages/benchmarks/Python/havlak.py#L368-L369 Causes a compilation issue for GraalPython, with an overall slowdown on Havlak of >10x. My current understanding is that there's a uncommon trap, or transfer to interpreter, triggering it here: https://github.com/oracle/graalpython/blob/master/graalpython/com.oracle.graal.python/src/com/oracle/graal/python/nodes/builtins/ListNodes.java#L283

A guess without knowing anything is that list.clear() might discard relevant information and triggering code invalidation, probably indirectly. Just a guess...

smarr commented 3 years ago

@timfel DeltaBlue (with builtin collections) is behaving odd, too. graalpython harness.py DeltaBlue 1000 12000 runs out of heap.

eregon commented 2 years ago

@smarr I'm really interested in the variant using built-in collections, what's left for it, can I help? Is it something we could link to in the README or somehow make more prominent? I think it would be very useful to compare performance of languages & implementations on reasonably idiomatic yet structurally similar classic benchmarks.

smarr commented 2 years ago

@eregon I can't think of any work to be done on it from the top of my head.

It's here: https://github.com/smarr/are-we-fast-yet/tree/awfy-dynamic-languages and should be "as good" as the normal one, of course with various performance differences.

I am not sure how to handle it in terms of the project. It shrinks down the scope of supported languages to JavaScript, Python, and Ruby, while at the same time adding more variables into the comparison. So, I'd be hesitant to just "add" it to the project. You know how it does. Every new variant just makes things harder to explain. It's there if someone wants to use it for something specific.

smarr commented 2 years ago

Merged as "good enough" for the moment