oils-for-unix / oils

Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!
http://www.oilshell.org/
Other
2.84k stars 156 forks source link

Hello from MicroPython #221

Closed pfalcon closed 5 years ago

pfalcon commented 5 years ago

Thanks for your detailed blog. I'm not registered on discussion sites you use/threads there are expired, so please bear with me responding with some comments here.

From http://www.oilshell.org/blog/2018/03/04.html:

Why not use MicroPython? I answered this on lobste.rs. I mentioned MicroPython in this June 2017 post.

From https://lobste.rs/s/qtbxqf/oil_dev_log_7_hollowing_out_python#c_psfojx :

But it’s more than Oil needs and less than it needs at the same time… more because it implements a lot of the dynamism in Python that Oil doesn’t use (and which makes things slower)

Well, MicroPython faithfully implements a large subset of Python3 language. Next, "a lot of the dynamism in Python that Oil doesn’t use" might be not even true. Remember that metaprogramming you use during startup, you don't want to lose it, do you? And beyond that, we have just the same thing in mind - now that the foundation is up, time to think about optimizations. That's why I'm working on https://github.com/oilshell/oil/issues/220, by the same reasons as you.

and less because it doesn’t have the same bindings (POSIX, readline).

readline? Heh we had building-with-readline support, but removed it, as MicroPython has own line editing module. Of course, there's no "bindings", it's just used in REPL. If you need "bindings", it's mostly trivial to make them using FFI.

About POSIX even less clear. MicroPython has POSIX port. Otherwise, it's a bit strange to hear complaints like that, knowing that you work on tearing away parts from CPython ;-). MicroPython just never adds that stuff in the first place. The core language and standard library are separate projects. (And standard library is individual installable packages, not a few monolithic megabytes of code-drop). So, what exactly "POSIX bindings" do you have in mind? Fancy os.read()? https://github.com/pfalcon/micropython-lib/blob/master/os/os/__init__.py#L185 . Fancy select.epoll? https://github.com/pfalcon/micropython-lib/blob/master/select/select.py#L49 . Fancy executing kernel syscalls directly? https://github.com/pfalcon/micropython-projs/blob/master/syscalls/syscall.py . All same old good FFI. Want the real thing with "int 0x80"? Here's a template: https://github.com/pfalcon/micropython-jitgen/blob/master/example1.py , you'll need to code up the "int" and calling convention yourself.

It’s also more than 50K lines of code if I remember correctly, so to meet my goals of “one-person maintainability” I’d have to strip it down just like I did CPython. In particular MicroPython has an entire front end that I don’t need, because I have “OPy”.

Just specially for guys like you, MicroPython has 250+ config options: https://github.com/micropython/micropython/blob/master/py/mpconfig.h . Sorry for ruining fun of going over 130K lines and figuring out how to cut them with magic coverage scissors, reading descriptions of config options is much more boring, I know! There's for example MICROPY_ENABLE_COMPILER https://github.com/micropython/micropython/blob/master/py/mpconfig.h#L337

Btw, I really mean "for guys like you", because I have just the same weird needs, and I coded up a good third of MicroPython (and more than half if counting micropython-lib as a part of project).

andychu commented 5 years ago

Hey, thanks for reaching out. Good to hear from others interested in making Python smaller and faster :)

My judgement at the time was that MicroPython was impressive (as I wrote on the blog), but it had different goals. There's a fairly big difference between:

My impression was that MicroPython is supposed to be smaller, but not necessarily faster. (The OSH parser needs to be a lot faster than it is now.) But it sounds like you're starting to work more on speed optimizations? Are there any benchmarks comparing it to CPython?

The list of modules/functions I use is here:

http://www.oilshell.org/release/0.6.pre12/metrics.wwz/cpython-defs/overview.txt

Probably the biggest issues impacting the speed of the OSH interpreter is allocation overhead and function/method call overhead.

And there is the "double interpretation" problem. Porting to MicroPython wouldn't solve that.


I have done a bunch of thinking about this problem since writing those posts, and right now I think I can't avoid making Oil a typed program. So I plan to add Python 3 type annotations.

I will just have to expand all the metaprogramming as text. I didn't want to do that 2 years ago, but now I'm fine with that since the code's architecture is more stable.

I don't need the dynamism after startup. In fact I wanted to do exactly what PyPy does -- allow arbitary Python at startup, but only allow restricted RPython at runtime. I was going to have the restricted, more static, non-JITTed "OPy" at runtime.

However, now I think that's too much work, and I should lean on the work of others. Most likely I will use MyPy and the new mypyc compiler:

https://github.com/mypyc/mypyc

I think they did a nice job with the intermediate language:

https://github.com/mypyc/mypyc/tree/master/test-data

In other words, all the operations are typed instead of working on PyObject, which should be a big speedup. They also say they compile each Python function into TWO functions -- one that respects the Python API and then one that can be called directly. So making direct calls should speed things up a lot as well, since Oil has many small methods and functions.

I looked at the Shed Skin Python to C++ compiler, as well as Nuitka. They both are doing some kind of type inference but I don't think it's appropriate for OSH. Explicit types seem more promising.


Anyway, I think there's definitely some relation with what we're doing, but also significant differences.

So yeah I don't think MicroPython is a good fit, but let me know if there's something I'm not seeing.

The Python 3 port of the compiler package is interesting too. But what I realized is that adding explicit types and type checking is a HUGE amount of work that I don't want to do myself. The compiler itself is only 4K-8K lines of code -- I expect adding types is doing to be 20K+ lines of code like MyPy is. I did not realize that when drafting the original "OPy" plan.


I also fixed a few bugs in the compiler package if you're interested, and I know of a few that are left. There was a bug I described here related to name analysis of generator expressions:

http://www.oilshell.org/blog/2018/12/16.html#toc_1

There is also a bug here when "OPy" generates 30,000 names in the bytecode, compared with 17,000 names for CPython:

http://www.oilshell.org/release/0.6.pre12/metrics.wwz/bytecode/overview.txt

I'm interested to see where you go with the port, but as I said I'm leaning toward types via MyPy. Originally I did not like that MyPy doesn't understand certain kinds of metaprogramming, but I'm willing to "give" on that to avoid writing my own Python type checker! Now that mypyc exists, MyPy becomes more interesting to me. I think it's still very early, but I'm going to poke around in that direction, and if I have correct types, it will be easier to generate a C++ program like Shed Skin does.

I chatted with somebody who uses Shed Skin in production. When it works it is pretty impressive! It should work in more places if it has explicit types rather than having to infer.

andychu commented 5 years ago

I guess the tl;dr is:

I certainly could port Oil to MicroPython, with some effort. But I don't think it would be fast enough if I did, simply because MicroPython is trying to be compatible, whereas I want to take advantage of incompatibility (and types) to speed things up.

It would be essentially "lateral" work and not forward work.

pfalcon commented 5 years ago

Thanks for the detailed reply! Let me go over it in chunks.

My judgement at the time was that MicroPython was impressive (as I wrote on the blog)

Btw, would you consider adding MicroPython to your famous cross-ref page: http://www.oilshell.org/cross-ref.html ? It seems to contain references to even more remote stuff than MicroPython would be ;-).

pfalcon commented 5 years ago

MicroPython was impressive (as I wrote on the blog), but it had different goals. There's a fairly big difference between:

  • a language used to implement another language, and
  • a language meant to be interpreted on a microcontroller

MicroPython has very simple and obvious goal: to be an implementation of the Python programming language (which is a generic programming language) which scales down. Note majority of people and entities nowadays is concerned with scaling up, way to burn more and more resources, etc. That made that area very boring place. Again, MicroPython is concerned with scaling down, and does it quite well, as proven by its ability to run even "on a microcontroller".

All that doesn't mean that MicroPython can't scale up - sure it can, and as a very configurable project (as direly needed for scaling-down), it's in better shape to do that than many other projects. It's just it's very boring, crowded area, sweated out by likes of google, facebook, apple. Who'd want to spend their life in such company, to be just another incomprehensible voice in crowded cacophony? A crying voice in a desert is definitely better ;-).

pfalcon commented 5 years ago

My impression was that MicroPython is supposed to be smaller, but not necessarily faster.

MicroPython is supposed to be smaller, use orders of magnitude less memory (can startup in a kilobyte of heap), and then avoid being slower than necessary (all Python features on that route configurable, and some disabled by default, their use in MicroPython programs discouraged), then give users means to make their programs as fast as they want (by spending some effort on that ;-) ).

But it sounds like you're starting to work more on speed optimizations?

MicroPython had things like native codegeneraors since its inception in 2013: https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/665145 . So yeah, we're continuing ;-).

Are there any benchmarks comparing it to CPython?

There were times where MicroPython, with suitable config options, ran pystone faster than CPython3.4. Of course, CPython didn't stand still, and was fixing their shame, so with 3.6, it's no longer the case. They have infinite resources (as belonging to you) to waste on that, remember. E.g. 3.6 switched to wordcode, making bytecode density probably 80-100% worse. In their target area, the only thing people noticed is performance improvement. In MicroPython scaled-down word, that would be awful waste of memory, with lowy performance improvements (5-10%? phew, need frameworks shooting for 10x improvements.)

There's no "sustainably running" benchmarks re: MicroPython vs CPython. But that's why I'm writing to you ;-). (You surely don't think I want to persuade you to port Oil to MicroPython - what would the benefit to world of that? ;-) Nope, I want to persuade you to do with MicroPython what you did with CPython, as unlike CPython, MicroPython is intended for that kind of hacking).

What we have is e.g. http://micropython.org/resources/code-dashboard/ (red line).

pfalcon commented 5 years ago

Probably the biggest issues impacting the speed of the OSH interpreter is allocation overhead and function/method call overhead.

You were concerned with implications of CPython's refcounting on forking processes? We got you covered - MicroPython uses faithful GC.

You've read up that CPy3.6 has got LOAD_METHOD/CALL_METHOD opcodes? Well, congrats, what CPython had in 2016, after ~25 years of its evolution, MicroPython had in 2013, after 0 years of its evolution.

Simple calculations show that MicroPython evolves infinitely faster than CPython. And just imagine, if you'd join the fun, it would evolve even 1.57 times faster than infinitely! ;-)

pfalcon commented 5 years ago

I'm more or less trying to run and speed up ONE Python program -- Oil.

I agree with you, it's terrible waste of approach and effort ;-).

I'm willing to modify this program in weird ways to make the code faster or the compiler easier to write.

MicroPython implements subset of the Python language. MicroPython's response to folks whining "I miss a feature from CPython!!111" is "Feel free to use CPython instead. If not, embrace the enlightment of writing unbloated programs" (https://github.com/micropython/micropython/wiki/ContributorGuidelines). You'd feel at home with MicroPython ;-),

In contrast, MicroPython is trying to run many unmodified Python programs as far as I can tell.

That's secondary goal at best. Python is good language, but majority of software is crap, even written in Python. Why bother with running it? There's CPython, PyPy, Nuitka for that. MicroPython rather concentrates on allowing to develop un-bloated, un-crappy software, promoting view of Python as a generic programming language, not a bag of adhoc warts.

pfalcon commented 5 years ago

I don't need the dynamism after startup. In fact I wanted to do exactly what PyPy does -- allow arbitary Python at startup, but only allow restricted RPython at runtime. I was going to have the restricted, more static, non-JITTed "OPy" at runtime.

Every schoolkid and their grandma figured out by now that it's what they want, modulo replacing "RPython" with something else per their likes. People just started talking to each other, and figured out that what was "novel approach of Magpie of 2010", turned out to be an obvious idea everyone had in mind for decade(s), but which everyone procrastinated all this time. Will it change now? Or will be the same as usual, people sitting under each's rock doing advanced things in adhoc way, barely known, largely not usable by anyone else, until they got tired with it, and all that work flushed down the /dev/null? Time to grab popcorn ;-).

andychu commented 5 years ago

Eh, your last comment doesn't make much sense, but let's not dwell on that.

Are MicroPython's runtime objects written in a reusable fashion? My experience is that embedded code is not very reusable. For efficiency reasons, the code tends to get pretty tightly coupled.

I would consider Lua's codebase to be portable and reusable C, but there is an efficiency penalty. e.g. there are zero globals in the codebase since it's meant to be embeddable. (FWIW I tried Lua but the VM is tightly coupled to the semantics of the Lua language, e.g. dicts/tables can't distinguish nil from missing, etc.)

What I'm looking for is implementations of:

But NOT:

Maybe:

Is there a way to build a binary with MicroPython code to, for example:

  1. build a tuple ('a', 1)
  2. build a list [1, 2, 3]
  3. create a dict {('a', 1): [1,2,3] }
  4. garbage collect the whole thing

And do it say WITHOUT the VM loop? And without the compiler, import mechanism, etc. ?

I was thinking of writing my own for Oil, but I realize this is a large amount of work. When I look at the files, it looks like it's all localized to a small number of source files. But I'm not sure what else these depend on.

If I could do some surgery on these, it might be interesting. You can sort of do it with CPython, but they're tangled up in global variables and the ref counting pervades the entire codebase. As mentioned I do want a fork-friendly GC, and it's nice that MicroPython apparently has that.

$ wc -l obj*.c | sort -n
...
   422 objint_mpz.c
   464 objint.c
   529 objlist.c
   531 obj.c
   532 objfun.c
   558 objexcept.c
   598 objset.c
   606 objdict.c
   636 objarray.c
  1405 objtype.c
  2224 objstr.c
 12819 total
pfalcon commented 5 years ago

Eh, your last comment doesn't make much sense, but let's not dwell on that.

Well, yeah, sorry, "everyone and their grandma" is of course just a common idiom and "schoolkid" is overboard ;-).

But otherwise driven by many occurrences of "oh, looks like I'm doing what X was doing before", where X is PyPy, FAT Python, Wren, Magpie, etc., etc. Before that, by reading python-dev archives - people discussed all that stuff 15 years ago, see e.g. https://www.python.org/dev/peps/pep-0266/ , https://www.python.org/dev/peps/pep-0267/ , https://www.python.org/dev/peps/pep-0280/ (those PEPs are of course specific points on how not to suck at runtime), before that my own thoughts on that (along the lines of "as soon as you start to look at how bytecode works, it becomes obvious that it needs to be fixed, why the heck I have to do that now, and nobody did/finished it to be usable before?" See e.g. https://github.com/pfalcon/micropython/issues/26)

So yep, in my list, the idea is oh-so-obvious and in the air.

pfalcon commented 5 years ago

Are MicroPython's runtime objects written in a reusable fashion? My experience is that embedded code is not very reusable. For efficiency reasons, the code tends to get pretty tightly coupled.

Again, there's nothing especially "embedded code" in MicroPython. MicroPython by now has clearly found it's niche in that "embedded" stuff, but it has become its curse (all the arduino people storming the forum, etc.) That's why https://github.com/pfalcon/awesome-micropython starts with "MicroPython came known as a "Python for microcontrollers". It's far more than that.".

e.g. there are zero globals in the codebase since it's meant to be embeddable.

Well, MicroPython of course supports threading, so there can't be (many of) raw globals. But we don't pass VM* to every func either, exactly to let folks either define VM struct to a global or thread-local var and get what they want.

FWIW I tried Lua but the VM is tightly coupled to the semantics of the Lua language, e.g. dicts/tables can't distinguish nil from missing, etc.

Everyone tried Lua. Some swallowed it, but some failed to acquire Stockholm syndrome. I can formulate that MicroPython's aims is to achieve what Lua/LuaJIT achieved, but with the bliss of Python instead of ugliness of Lua.

unicode (Oil will use UTF-8 semantics like Go, not like Python)

We would need to talk about that (you and ESR are 2 folks I know who can't get over Python3 unicode), but in the meantime, the usual "we got you covered": before I got over that stuff myself, I made sure that MicroPythin has "non unicode" mode for strings (i.e. Py2-style). Nobody really uses it, so it's in need of love from some "I can't get Py3 unicode" zealot. Hint, hint ;-).

Maybe: exceptions

Well, if it's "maybe", then you could just use RPython, tinypy, PyMite, etc. Support for exception is effectively a man-or-boy test for small Python implementations.

And do it say WITHOUT the VM loop?

Excuse me, what? ;-) Python remains bytecode-interpreted language, it just acquires native-compilation support which is used just as widely as bytecode. You probably should formulate the question more directly. Do you ask "Can mypyc/shedskin/whatever be ported to MicroPython"? I don't know which answer you expect ;-). It should be! And heck, I don't know when we get to it, unless folks like you drop by! ;-)

You can sort of do it with CPython, but they're tangled up in global variables and the ref counting pervades the entire codebase.

The message I try to convey: MicroPython is intended for kinds of hacks you're doing. While CPython of course has different purpose. Doing that on CPython is end in itself, nobody's going to be interested in the result. Doing it in MicroPython actually can be useful to its ecosystem (which is part of big Python ecosystem). You for sure don't risk much - if you wanted to end up with adhoc and not reusable CPython2 (2!) port, no sweat if it ends up like that with MicroPython in the worst case. The only risk for you is being locked with scissors and CPython code for a few months vs actually implementing missing parts for MicroPython.

So, overall: feel free to look in more detail at the uPy source, I'd be happy to answer any questions.

andychu commented 5 years ago

Yeah basically I'm thinking of whether mypyc could be targeted at a different "object space" (in PyPy's terms). But that work is far off. And it's possible that doing something like Shed Skin, which has its own object space that is quite different, would be a better use of time and provide more speedup.

I'm not really interested in working on Python interpreters in general... my real goal is to add some Python semantics to shell with Oil, not make a smaller Python implementation. If I were to use Micro Python I suspect it would be a fork anyway, like I forked CPython.

I think you would be more successful in your efforts if you communicated in a more pleasant fashion.