Open abingham opened 3 years ago
+1 to this idea. In current pyte implementation, the stream and screen are tightly coupled. It is difficult to inject a customized processor between stream and screen.
factory functions can help this should have same performance as current impl
concept for a "reusable parser"
import re
import types
class Stream:
def __init__(
self,
handle_text=None,
parse_controls=None,
):
class mock_listener:
def draw(self, string):
print("listener.draw", string)
self.listener = mock_listener()
self._text_pattern = re.compile("[a-z]+")
self._taking_plain_text = True
feed = self.make_feed(handle_text, parse_controls)
self.feed = types.MethodType(feed, self) # bind method to instance
def _send_to_parser(self, char):
print("_send_to_parser", char)
def make_feed(self, handle_text, parse_controls):
"""make a feed function for stream.feed(data)"""
if not handle_text:
handle_text = self.listener.draw
if not parse_controls:
parse_controls = self._send_to_parser
def feed(self, data):
"""Consume some data and advances the state as necessary.
:param str data: a blob of data to feed from.
"""
send = parse_controls #send = self._send_to_parser
draw = handle_text #draw = self.listener.draw
match_text = self._text_pattern.match
taking_plain_text = self._taking_plain_text
length = len(data)
offset = 0
while offset < length:
if taking_plain_text:
match = match_text(data, offset)
if match:
start, offset = match.span()
draw(data[start:offset])
else:
taking_plain_text = False
else:
taking_plain_text = send(data[offset:offset + 1])
offset += 1
self._taking_plain_text = taking_plain_text
return feed
stream = Stream()
stream.feed("asdf 0123")
the parser is already generated by
self._parser = self._parser_fsm()
so we just need to modify _parser_fsm
to accept custom handler functions
already possible: use a custom screen
object
# pyte/test_parser.py
# python3 -m pyte.test_parser
# ansi color codes https://gist.github.com/Prakasaka/219fe5695beeb4d6311583e79933a009
#from pyte.screens import Screen, DiffScreen, HistoryScreen, DebugScreen
from .screens import Screen, DiffScreen, HistoryScreen, DebugScreen
#from pyte.streams import Stream, ByteStream
from .streams import Stream, ByteStream
terminal_width = 40
terminal_height = 4
class CustomScreen(Screen):
def draw(self, *args):
print("custom listener: draw", repr(args))
super().draw(*args)
def set_title(self, *args):
print("custom listener: set_title", repr(args))
def select_graphic_rendition(self, *args):
print("custom listener: select_graphic_rendition", repr(args))
screen = CustomScreen(terminal_width, terminal_height)
stream = ByteStream(screen)
stream.feed(b"".join([
b"\x1b", # esc = \e
b"]", # osc
b"2;new title", # params: 2, "new title"
b"\x07", # bel = \a -> end of string
b"\x1b", # esc
b"[", # csi
b"0;31", # params: 0, 31 -> red
b"m", # select_graphic_rendition
b"red", # text
b"\x1b[0;32m", # esc csi green
b"green", # text
b"\x1b[0m", # reset style
b"default", # text
]))
term_lines = screen.display[:] # copy array
for line_idx, line in enumerate(term_lines):
print(f"{line_idx:4d} {line} ¶")
output
custom listener: set_title ('new title',)
custom listener: select_graphic_rendition (0, 31)
custom listener: draw ('red',)
custom listener: select_graphic_rendition (0, 32)
custom listener: draw ('green',)
custom listener: select_graphic_rendition (0,)
custom listener: draw ('default',)
0 redgreendefault ¶
1 ¶
2 ¶
3 ¶
As @milahu points out, this should be doable without any changes to pyte
.
The coupling between Stream
and Screen
is tight in a sense that the names of event handlers are fixed, but Stream
does not assume anything about the implementation of Screen
. So, you could have a custom Screen
class which emits IR instructions instead of doing buffer manipulations. pyte.DebugScreen
already does something like that, except that it logs the intercepted events to stderr.
So, you could have a custom
Screen
class
This was exactly the approach I took at first. It turned out that didn’t give me everything I needed, though. In particular, the information about precisely which bytes were parsed for each call to a Screen
method was lost. I suspect that pyte itself wouldn’t benefit greatly from providing this kind of information, though, so there may not be a compelling argument for making it here.
precisely which bytes were parsed for each call to a
Screen
method
doable with near-zero overhead
https://github.com/milahu/pyte/tree/parser-pass-token-source
edit: fixed edgecase where token spans across two data buffers
$ git checkout master
$ BENCHMARK=tests/captured/htop.input python benchmark.py
htop.input->Screen: Mean +- std dev: 144 ms +- 5 ms
htop.input->DiffScreen: Mean +- std dev: 145 ms +- 5 ms
htop.input->HistoryScreen: Mean +- std dev: 378 ms +- 9 ms
$ git checkout parser-pass-token-source
$ BENCHMARK=tests/captured/htop.input python benchmark.py
htop.input->Screen: Mean +- std dev: 144 ms +- 5 ms
htop.input->DiffScreen: Mean +- std dev: 145 ms +- 4 ms
htop.input->HistoryScreen: Mean +- std dev: 379 ms +- 11 ms
I suspect that pyte itself wouldn’t benefit greatly from providing this kind of information, though, so there may not be a compelling argument for making it here.
yepp, for pyte this is just wasted cpu time but it would be nice to use the pyte source to compile such a parser https://stackoverflow.com/questions/56487216/how-can-i-convert-python-code-into-a-parse-tree-and-back-into-the-original-code
The core of this proposal is to introduce an intermediate form of parsed data between the stream and the screen. Rather than the screen feeding its parsed results directly to the screen, it would generate a stream of objects representing the parsed data, and these could be forwarded to the
Screen
API and potentially other clients. This IR could also be stored, analyzed, replayed, etc.This idea came out of some work I was doing to learn more about control codes. In particular, I borrowed heavily (stole) from pyte's
Stream
class in myParser
implementation. I think this kind of thing could be introduced to pyte with full backwards compatibility, and it would mean I wouldn't need to duplicateStream
. I know this by itself isn't a very compelling argument for modifying pyte, but it might be useful in pyte as well (e.g. I saw some issues related to improving debugging).In any event, I thought I'd float the idea and see what you thought. I should be able to do most of the coding, though of course I'd appreciate any guidance you've got.