heynemann / pyvows

Python implementation of Vows.js
http://pyvows.org
MIT License
133 stars 28 forks source link

Replace gevent dep. for parallelism #55

Open heynemann opened 11 years ago

heynemann commented 11 years ago

Gevent is too big of a dependency. We need a somewhat softer dependency.

Not sure what to use here, but we need something.


Edited by @Zearin (2013-01-28):
Tweaked phrasing (for clarity)

Zearin commented 11 years ago

Can you elaborate on what you mean by…

heynemann commented 11 years ago

Installing gevent is hard. It has a dependency on libevent (or libev) depending on the version.

It is also not accomplishing what I wanted it for (parallel execution). If we just want to pretend to be parallel, we should be using plain old Python queues with threads.

I'm thinking of a strategy using forks, but haven't quite got there.

Cheers, Bernardo Heynemann


Edited by @Zearin (2013-01-28):
Tweaked markup

Zearin commented 11 years ago

I do know that parallel execution involves minimal blocking of I/O, and that it means fast, fast, fast!.

However, I’m afraid that I don’t have a thorough understanding of the problems with depending on libevent/libev, or the number and nature of tradeoffs between true parallelism and “fake” parallelism.

In the interest of helping PyVows towards this end, I’ve written up a quick list of candidates. I’m not qualified to judge whether any of these are a good fit; but hopefully the summaries will help you to decide whether any of them are worth further investigation.


Asynchronous Python Module Search: Round 1

async

https://github.com/gitpython-developers/async

Async aims to make writing asyncronous processing easier. It provides a task-graph with interdependent tasks that communicate using blocking channels, allowing to delay actual computations until items are requested. Tasks will automatically be distributed among 0 or more threads for the actual computation.

Even though the GIL effectively prevents true concurrency, operations which block, such as file IO, can be sped up with it already. In conjuction with custom c extensions which release the GIL, true concurrency can be obtained as well.

async_subprocess

http://pypi.python.org/pypi/async_subprocess/ Cross-platform wrapper around subprocess.Popen to provide an asynchronous version of Popen.communicate().

cogen

http://code.google.com/p/cogen/

cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from Python 2.5. The project aims to provide a simple straightforward programming model similar to threads but without all the problems and costs.

Features

bluelet

https://github.com/sampsyo/bluelet

Bluelet is a simple, pure-Python solution for writing intelligible asynchronous socket applications. It uses PEP 342 coroutines to make concurrent I/O look and act like sequential programming.

In this way, it is similar to the Greenlet green-threads library and its associated packages Eventlet and Gevent. Bluelet has a simpler, 100% Python implementation that comes at the cost of flexibility and performance when compared to Greenlet-based solutions. However, it should be sufficient for many applications that don't need serious scalability; it can be thought of as a less-horrible alternative to asyncore or an asynchronous replacement for SocketServer (and more).

desync

https://github.com/bgilmore/desync

Decouple your asynchronous code from your event loop implementation.

The intended function of this framework is to allow developers to write async applications and components that will run without modification on a wide range of mainstream async/evented frameworks (Twisted and Tornado being the two initially targeted frameworks for support).

This is currently a pre-alpha experiment and shouldn't be used by anyone.

monocle

https://github.com/saucelabs/monocle

An async programming framework with a blocking look-alike syntax.

monocle straightens out event-driven code using Python's generators. It aims to be portable between event-driven I/O frameworks, and currently supports Twisted and Tornado.

It's for Python 2.5 and up; the syntax it uses isn't supported in older versions of Python. (Versions before 2.7 require the ordereddict module.)

teena

https://github.com/zacharyvoase/teena

Python ports of useful syscalls, using asynchronous I/O.

Teena aims to be a collection of ports of UNIX and Linux syscalls to pure Python, with an emphasis on performance and correctness. Windows support is not a primary concern—I’m initially targeting only POSIX-compliant operating systems. The library uses Tornado to do efficient asynchronous I/O.

The first version of this library will contain implementations of tee and splice which operate on files, sockets, and file descriptors. There’s also a Capture class which behaves like StringIO, but it has a fileno() and so can be used where a real file descriptor is needed.

heynemann commented 11 years ago

Thanks a lot, man! As soon as I get some spare time, I'll check these.

Cheers,

Bernardo Heynemann Developer @ globo.com

On Mon, Dec 24, 2012 at 3:35 PM, Tony notifications@github.com wrote:

I do know that parallel execution involves minimal blocking of I/O, and that it means fast, fast, fast!.

However, I’m afraid that I don’t have a thorough understanding of the problems with depending on libevent/libev, or the number and nature of tradeoffs between true parallelism and “fake” parallelism.

In the interest of helping PyVows towards this end, I’ve written up a quick list of candidates. I’m not qualified to judge whether any of these are a good fit; but hopefully the summaries will help you to decide whether

any of them are worth further investigation.

Asynchronous Python Module Search: Round 1 async

https://github.com/gitpython-developers/async

Async aims to make writing asyncronous processing easier. It provides a task-graph with interdependent tasks that communicate using blocking channels, allowing to delay actual computations until items are requested. Tasks will automatically be distributed among 0 or more threads for the actual computation.

Even though the GIL effectively prevents true concurrency, operations which block, such as file IO, can be sped up with it already. In conjuction with custom c extensions which release the GIL, true concurrency can be obtained as well. async_subprocess

http://pypi.python.org/pypi/async_subprocess/ Cross-platform wrapper around subprocess.Popen to provide an asynchronous version of Popen.communicate(). cogen

http://code.google.com/p/cogen/

cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from Python 2.5. The project aims to provide a simple straightforward programming model similar to threads but without all the problems and costs. Features

  • wsgi server with coroutine extensions - enabling asynchronous wsgi apps in a regular wsgi stack
  • fast network multiplexing with epoll, kqueue, select, poll or io completion ports (on windows)
  • epoll/kqueue support via the wrappers in the python 2.6's stdlib or separate modules py-kqueue, py-epoll
  • iocp support via ctypes wrappers or pywin32
  • sendfile/TransmitFile support (the wsgi server also uses this for wsgi.file_wrapper)
  • timeouts for socket calls, signal waits etc
  • various mechanisms to work with (signals, joins, a Queue with the same features as the stdlib one) and some other stuff you can find in the docs :)

bluelet

https://github.com/sampsyo/bluelet

Bluelet is a simple, pure-Python solution for writing intelligible asynchronous socket applications. It uses PEP 342 coroutines to make concurrent I/O look and act like sequential programming.

In this way, it is similar to the Greenlet green-threads library and its associated packages Eventlet and Gevent. Bluelet has a simpler, 100% Python implementation that comes at the cost of flexibility and performance when compared to Greenlet-based solutions. However, it should be sufficient for many applications that don't need serious scalability; it can be thought of as a less-horrible alternative to asyncore or an asynchronous replacement for SocketServer (and more). desync

https://github.com/bgilmore/desync

Decouple your asynchronous code from your event loop implementation.

The intended function of this framework is to allow developers to write async applications and components that will run without modification on a wide range of mainstream async/evented frameworks (Twisted and Tornado being the two initially targeted frameworks for support).

This is currently a pre-alpha experiment and shouldn't be used by anyone. monocle

https://github.com/saucelabs/monocle

An async programming framework with a blocking look-alike syntax.

monocle straightens out event-driven code using Python's generators. It aims to be portable between event-driven I/O frameworks, and currently supports Twisted and Tornado.

It's for Python 2.5 and up; the syntax it uses isn't supported in older versions of Python. (Versions before 2.7 require the ordereddict module.) teena

https://github.com/zacharyvoase/teena

Python ports of useful syscalls, using asynchronous I/O.

Teena aims to be a collection of ports of UNIX and Linux syscalls to pure Python, with an emphasis on performance and correctness. Windows support is not a primary concern—I’m initially targeting only POSIX-compliant operating systems. The library uses Tornado to do efficient asynchronous I/O.

The first version of this library will contain implementations of tee and splice which operate on files, sockets, and file descriptors. There’s also a Capture class which behaves like StringIO, but it has a fileno() and so can be used where a real file descriptor is needed.

— Reply to this email directly or view it on GitHubhttps://github.com/heynemann/pyvows/issues/55#issuecomment-11663821.

heynemann commented 11 years ago

I'm thinking about trying a simple threaded approach. See how it goes. Even knowing that python does not have true paralellism.

Zearin commented 11 years ago

Even knowing that python does not have true paralellism.

I KNOW!!!

Considering all the attention Node.js is getting these days, you’d think Python would at least do something to overcome the limitations of the GIL (Global Interpreter Lock).

Don’t get me wrong. I love what Node.js is doing. I really admire the nonblocking I/O built right into the language itself.

I’m just not a fan of JavaScript’s syntax. Node.js has done a lot to make it more palatable, but it’s still JavaScript at heart. (I’ve toyed with CoffeeScript. I like it. A lot. But, I still like it less than Python.)

I really really really, really want to have Python’s syntax with Node.js’s über-async performance.

Sigh. Maybe Python is finally succumbing to old age. ☹ The future is asynchronous and parallel.

(At least PyVows still outperforms other testing by lots. :P)

Zearin commented 11 years ago

@heynemann:

The futures module looks promising!

Description:

Backport of the concurrent.futures package from Python 3.2

For documentation, it simply refers you to the official Python 3 docs. I take that as a good sign; they are aiming to make the backport so true-to-the-original that it doesn’t require its own list of caveats and warnings about its feature set.

Does this have Gevent-replacing potential?


Update

Oops! Look like it actually does have its own documentation.

Still, it looks promising…

heynemann commented 11 years ago

It does look good! I'll test it as soon as I get some time.

If you want to go ahead and try to use it, I'd be happy to evaluate a pool request.

Cheers, Bernardo Heynemann

Bernardo Heynemann Developer @ globo.com

On Sun, Feb 3, 2013 at 2:42 PM, Tony notifications@github.com wrote:

@heynemann https://github.com/heynemann:

The futures module http://pypi.python.org/pypi/futures/2.1.3 looks promising!

Description:

Backport of the concurrent.futures package from Python 3.2

For documentation, it simply refers you to the official Python 3 docs. I take that as a good sign; they are aiming to make the backport so true-to-the-original that it doesn’t require its own list of caveats and warnings about its feature set.

Does this have Gevent-replacing potential?

— Reply to this email directly or view it on GitHubhttps://github.com/heynemann/pyvows/issues/55#issuecomment-13049475.

pplante commented 11 years ago

I am curious why gevent was even chosen for test parallelization. I find it to be a clumsy dependency that leads to some really difficult to debug issues.

For instance we sunk a few hours the other week trying to figure out why a test was broken only sometimes. We ended up finding its because we forgot to tell our Vows.Context subclass to mark a method as ignored. The error we were experiencing was something so buried in the spaghetti gevent stuff that it literally took 2 hours to track down. When we finally found the solution its only because we were using the spaghetti method of debugging (throw something at the wall until it sticks).

We really love the test organization that pyVows offers since it closely mirrors our CoffeeScript/JavaScript test suites. However this dependency choice makes using pyVows a headache at times. How difficult would it be to rip out gevent, or make an optional non-gevent test runner that runs each context and test sequentially? I think you came up with a great way to handle testing in Python without crazy bytecode or VM hacks, so I really want to see pyVows development furthered.

Thanks!

Zearin commented 11 years ago

Well, check out the early parts of this thread. Although I don't know why Gevent was chosen in the first place (aside from parallel execution), it's not here to stay.

Actually, I’ve been trying to refactor PyVows to work with concurrent.futures (which is Python 2–3 compatible). The only reason I haven’t yet is that is that I’m completely inexperienced in this kind of programming. ☺ That’s not stopping me from trying, but it does slow me down.

Near as I can tell, using concurrent.futures is a good choice, but it will require a significant reorganization of the code in the runner module. From articles I’ve read and videos I’ve watched in order to learn more about this subject, some of runner’s giant methods that need to be broken down into smaller bits of execution if we’re going to keep execution time fast.

If I’ve understood everything correctly, Gevent uses coroutines, which is an entirely different concurrency strategy than threads. I think that’s why PyVows performs so fast with Gevent with the execution code structured as is, whereas the same structure would be slow/broken using threads.

If you have experience concurrent.futures, I’d love to learn more about it. My early attempts to use it resulted in the runtime only executing a small subset of the tests (i.e. it said “I’m done testing!” after only a few tests had completed…way too early). After that, I spent a couple days reading and learning more about this stuff, but that’s when I realized I needed to use callbacks…

Which requires all that breaking down into smaller bits of execution and stuff. ☺

Fear not. Gevent is not here to stay.

heynemann commented 11 years ago

I think a good solution would be for us to implement different runners, instead of messing with the gevent one. That way we can do "best-case" runners:

What do you guys think?

Cheers,

Bernardo Heynemann Developer @ globo.com

On Sun, Apr 21, 2013 at 11:31 AM, Tony notifications@github.com wrote:

Well, check out the early parts of this thread<#13e2d015ddbcb5d6_issuecomment-10476505>. Although I don't know why Gevent was chosen in the first place (aside from parallel execution), it's not here to stay.

Actually, I’ve been trying to refactor PyVows to work with concurrent.futures (which is Python 2–3 compatible). The only reason I haven’t yet is that is that I’m completely inexperienced in this kind of programming. ☺ That’s not stopping me from trying, but it does slow me down.

Near as I can tell, using concurrent.futures is a good choice, but it will require a significant reorganization of the code in the runnermodule. From articles I’ve read and videos I’ve watched in order to learn more about this subject, some of runner’s giant methods that need to be broken down into smaller bits of execution if we’re going to keep execution time fast.

If I’ve understood everything correctly, Gevent uses coroutines, which is an entirely different concurrency strategy than threads. I think that’s why PyVows performs so fast with Gevent with the execution code structured as is, whereas the same structure would be slow/broken using threads.

If you have experience concurrent.futures, I’d love to learn more about it. My early attempts to use it resulted in the runtime only executing a small subset of the tests (i.e. it said “I’m done testing!” after only a few tests had completed…way too early). After that, I spent a couple days reading and learning more about this stuff, but that’s when I realized I needed to use callbacks…

Which requires all that breaking down into smaller bits of execution and stuff. ☺

Fear not. Gevent is not here to stay.

— Reply to this email directly or view it on GitHubhttps://github.com/heynemann/pyvows/issues/55#issuecomment-16726604 .

pplante commented 11 years ago

That sounds perfect!

Zearin commented 11 years ago

I think a good solution would be for us to implement different runners, instead of messing with the gevent one. That way we can do "best-case" runners:

  • Is gevent available? Use it
  • Is futures available? Use it
  • Use sequential

What do you guys think?

Agreed!

(That thought had actually occurred to me…but I’m not having a lot of success, so I decided to keep my big mouth shut. ☺)