Closed kmod closed 4 years ago
Hi, just to let you know that I am willing to implement this. I am a student at University of Cambridge and we have to conduct a mini project hacking around llvm related stuff for our modern compiler design course. I am proposing to implement this as my mini project. The deadline for submitting the report is on Jan. So I will be working on this for next two months and hopefully I will get it working (although I have only a very vague idea of how pyston works internally right now). Anyway I will keep you guys updated with my progress.
Ok! I would suggest replacing (almost) all occurrences of Box* with some new PyValue type, since not all values will be boxes any more. Then we'll have to add macros that extract a Box* or other type from the PyValue.
I would check out JavaScriptCore/runtime/JSCJSValue.h (or other projects) to see how they do it.
Cool thanks for the suggestions.
I am skeptical of tagged pointers. Long ago (in the early '80s, before Python) I co-wrote an entire language implementation + environment using this approach, and we had an inordinate number of crashes due to code that was expecting a pointer but was passed one of those disguised unboxed integers. Hopefully your type checker will catch more of those at compile time, but it still feels iffy to me.
On Thu, Nov 13, 2014 at 12:24 AM, Bob Fang notifications@github.com wrote:
Cool thanks for the suggestions.
— Reply to this email directly or view it on GitHub https://github.com/dropbox/pyston/issues/181#issuecomment-62856376.
--Guido van Rossum (python.org/~guido)
Hi, I am now starting working on this and before I actually go down to implement it, I want do some initial benchmarking so after I have made some change I can see wether this change has a performance impact.
I think tagged pointers have potentially two benefits: first, it might use less memory and second it might be faster, especially for integer operations. Do you have some specific benchmarks that I should pay special attention to? Also I have looked at the benchmark, they are more focused on the timing rather than the memory usage right? Should I try to write some benchmark myself?
Thanks.
@gvanrossum hmm good to know. I think most current VMs do use some sort of tagged pointers, and some JS engines use the even more crazy nan boxing :P
@dorafmon benchmark choices are definitely an important and divisive topic. We only support a limited set of benchmarks for now; I'd try running the ones in the minibenchmarks/ directory as you get things working. We're actively working on getting more benchmarks defined and running under Pyston; ultimately the goal is "real world applications" such as the Dropbox webserver, or a benchmark suite that covers the same performance characteristics. Until then, I'd say just run on our minibenchmarks and make sure to leave the choice of object model toggleable, so that we can run the more comprehensive tests later.
We definitely focus on speed as opposed to memory usage for now. I think memory usage is also important and it'd definitely be interesting to see how that changes, though I don't think any of our current benchmarks are particularly memory intensive.
I thought the JS engine guys have it slightly easier with JS in this division because there is AFAIK only one type for numbers: double. But to my surprise they manage to squeeze into 64bit of memory one of the following [1]:
* This range of NaN space is represented by 64-bit numbers begining with the 16-bit
* hex patterns 0xFFFE and 0xFFFF - we rely on the fact that no valid double-precision
* numbers will fall in these ranges.
*
* The top 16-bits denote the type of the encoded JSValue:
*
* Pointer { 0000:PPPP:PPPP:PPPP
* / 0001:****:****:****
* Double { ...
* \ FFFE:****:****:****
* Integer { FFFF:0000:IIII:IIII
So maybe we should keep this in mind when implementing TP and try to make it extensible to use nan boxing later, in order to not favor int over double so much as I suspect most numbers in python programs fit in 32bit?!?.
Okay I drifted away a few days to write this report since the deadline is tomorrow. I haven't finish the work so this is just a progress report, but some premiliray results shows tagged pointers can really be helpful. Any comment/advice is welcomed. Thanks a lot for the help, and I will keep working on this. :smile:
Any progress?
FYI, mypyc uses them.
Currently we box all of our objects, including integers. Using tagged pointers could make integer handling much faster + more memory efficient, and is something we should definitely investigate and probably switch to. We could also use this to efficiently represent singletons like None, True, and False.
Ideally we would abstract out the object representation, so that we can easily switch between different ones. This would be useful for testing tagged pointers vs boxing-everything, but I think also would be handy for when we want to add 32-bit support.
One big wrinkle in this is the C API: it represents that all objects are actually pointers, and you are allowed to directly access the members -- there are 46 occurrences of
->ob_type
in the CPython Modules directory -- so passing a non-pointer could be quite bad. The ideas I can think of:I like option 3, and I think we'll want to do things like that for other issues as well so it's not a bad thing to start doing, but is probably the most work.