pygments / pygments.rb

💎 Ruby wrapper for Pygments syntax highlighter
MIT License
572 stars 141 forks source link

Add Pygments::Popen #19

Closed tmm1 closed 12 years ago

tmm1 commented 12 years ago

Present

Running a python VM inside the current ruby process continues to be problematic. There are reports of segfaults, problems with FFI, problems rubypython has finding libpython, and open bugs with multi-vm signal handling while inside python code.

Some of these issues are specific to Pygments::FFI and rubypython. But the alternative, Pygments::C, is too immature to use in production and would at a minimum require added exception handling code.

Past

pygments.rb's predecessors, albino and multipygmentize, suffered from a limited API and poor performance.

multipygmentize somewhat improved performance, but required additional work by the caller to make batch calls.

The benchmark isolates this performance problem to python startup and pygments library loading cost. In pygments.rb, we pay this startup cost only once in Pygments.start.

Future

The ideal implementation of pygments.rb then, is an API compatible interface that only pays startup cost once, but also provides isolation from the python code. Thus, Pygments::Popen.

Instead of a new process per invocation (like with albino), we keep a long-running python child and communicate with it over a pipe. To maintain the existing API and allow for future expansion, the protocol over the pipe can be simple bert-style RPC.

Alternatively, we could add Pygments::Socket and talk to a single pygments service over a tcp or unix socket. The advantages of this approach are limited, however, compared to the added complexity of packaging, scaling and deployment.

baoshan commented 12 years ago

+1 for Pygments::Socket.

rtomayko commented 12 years ago

Love this so much. :heart:

rtomayko commented 12 years ago

Way to not create a separate HTTP service for this btw. It's the cool way to solve these problems right now in case you didn't know.

vmg commented 12 years ago

Just to clarify, are you saying that you'd rather see the subprocess + pipe approach than external service + socket? I'm leaning towards that too.

rtomayko commented 12 years ago

@tanoku Yeah definitely.

baoshan commented 12 years ago

@tanoku and @rtomayko Do you think external service over socket or tcp has flexibilities on deployment and load balancing? GitHub may have more features depend on Pygments in the future. Homogeneous workers are more diligent, aren't they?

scottjg commented 12 years ago

tearing out the ffi usage of python should also fix github/github#4187

tnm commented 12 years ago

Full version of this has been implemented.