pyston / pyston_v1

The previous version of Pyston, a faster implementation of the Python programming language. Please use this link for the new repository:
https://github.com/pyston/pyston/
4.89k stars 289 forks source link

Use tagged pointers for efficient ints #181

Closed kmod closed 4 years ago

kmod commented 10 years ago

Currently we box all of our objects, including integers. Using tagged pointers could make integer handling much faster + more memory efficient, and is something we should definitely investigate and probably switch to. We could also use this to efficiently represent singletons like None, True, and False.

Ideally we would abstract out the object representation, so that we can easily switch between different ones. This would be useful for testing tagged pointers vs boxing-everything, but I think also would be handy for when we want to add 32-bit support.

One big wrinkle in this is the C API: it represents that all objects are actually pointers, and you are allowed to directly access the members -- there are 46 occurrences of ->ob_type in the CPython Modules directory -- so passing a non-pointer could be quite bad. The ideas I can think of:

  1. take any tagged pointers and box them when passing them to C API code
  2. most of ob_type references don't apply to integers, and maybe if we're lucky we can pass tagged pointers through directly
  3. we could add a custom compilation pass that identifies references to ob_type and maps them to our object-representation-handling function

I like option 3, and I think we'll want to do things like that for other issues as well so it's not a bad thing to start doing, but is probably the most work.

ghost commented 10 years ago

Hi, just to let you know that I am willing to implement this. I am a student at University of Cambridge and we have to conduct a mini project hacking around llvm related stuff for our modern compiler design course. I am proposing to implement this as my mini project. The deadline for submitting the report is on Jan. So I will be working on this for next two months and hopefully I will get it working (although I have only a very vague idea of how pyston works internally right now). Anyway I will keep you guys updated with my progress.

kmod commented 10 years ago

Ok! I would suggest replacing (almost) all occurrences of Box* with some new PyValue type, since not all values will be boxes any more. Then we'll have to add macros that extract a Box* or other type from the PyValue.

I would check out JavaScriptCore/runtime/JSCJSValue.h (or other projects) to see how they do it.

ghost commented 10 years ago

Cool thanks for the suggestions.

gvanrossum commented 10 years ago

I am skeptical of tagged pointers. Long ago (in the early '80s, before Python) I co-wrote an entire language implementation + environment using this approach, and we had an inordinate number of crashes due to code that was expecting a pointer but was passed one of those disguised unboxed integers. Hopefully your type checker will catch more of those at compile time, but it still feels iffy to me.

On Thu, Nov 13, 2014 at 12:24 AM, Bob Fang notifications@github.com wrote:

Cool thanks for the suggestions.

— Reply to this email directly or view it on GitHub https://github.com/dropbox/pyston/issues/181#issuecomment-62856376.

--Guido van Rossum (python.org/~guido)

ghost commented 9 years ago

Hi, I am now starting working on this and before I actually go down to implement it, I want do some initial benchmarking so after I have made some change I can see wether this change has a performance impact.

I think tagged pointers have potentially two benefits: first, it might use less memory and second it might be faster, especially for integer operations. Do you have some specific benchmarks that I should pay special attention to? Also I have looked at the benchmark, they are more focused on the timing rather than the memory usage right? Should I try to write some benchmark myself?

Thanks.

kmod commented 9 years ago

@gvanrossum hmm good to know. I think most current VMs do use some sort of tagged pointers, and some JS engines use the even more crazy nan boxing :P

@dorafmon benchmark choices are definitely an important and divisive topic. We only support a limited set of benchmarks for now; I'd try running the ones in the minibenchmarks/ directory as you get things working. We're actively working on getting more benchmarks defined and running under Pyston; ultimately the goal is "real world applications" such as the Dropbox webserver, or a benchmark suite that covers the same performance characteristics. Until then, I'd say just run on our minibenchmarks and make sure to leave the choice of object model toggleable, so that we can run the more comprehensive tests later.

We definitely focus on speed as opposed to memory usage for now. I think memory usage is also important and it'd definitely be interesting to see how that changes, though I don't think any of our current benchmarks are particularly memory intensive.

undingen commented 9 years ago

I thought the JS engine guys have it slightly easier with JS in this division because there is AFAIK only one type for numbers: double. But to my surprise they manage to squeeze into 64bit of memory one of the following [1]:

     * This range of NaN space is represented by 64-bit numbers begining with the 16-bit
     * hex patterns 0xFFFE and 0xFFFF - we rely on the fact that no valid double-precision
     * numbers will fall in these ranges.
     *
     * The top 16-bits denote the type of the encoded JSValue:
     *
     *     Pointer {  0000:PPPP:PPPP:PPPP
     *              / 0001:****:****:****
     *     Double  {         ...
     *              \ FFFE:****:****:****
     *     Integer {  FFFF:0000:IIII:IIII

So maybe we should keep this in mind when implementing TP and try to make it extensible to use nan boxing later, in order to not favor int over double so much as I suspect most numbers in python programs fit in 32bit?!?.

[1] https://github.com/WebKit/webkit/blob/18fbee95c3a5779e8a43215164a91c57997c3315/Source/JavaScriptCore/runtime/JSCJSValue.h

ghost commented 9 years ago

Okay I drifted away a few days to write this report since the deadline is tomorrow. I haven't finish the work so this is just a progress report, but some premiliray results shows tagged pointers can really be helpful. Any comment/advice is welcomed. Thanks a lot for the help, and I will keep working on this. :smile:

alexeykudinkin commented 9 years ago

Any progress?

gvanrossum commented 4 years ago

FYI, mypyc uses them.