Open pazz opened 12 years ago
there are libs that can parse VT output. but how do we get it from a subprocess? https://github.com/helgefmi/ansiterm
Since we are are using twisted, we should not really concern ourself with subprocess module, there is reactor.spawnProcess that we can give our implementation derived from ProcessProtocol that handles the events from that process asynchronously. The outReceived will be fired with each chunk of incoming data. If the parser library handles push-parsing, we can feed it directly, otherwise we will need to store that data (most efficient way in stdlib is afaik writing to cStringIO) and parse it when done.
It will be probably nice to display somewhere an indicator that the process is running, in case it takes long time.
Some more details: the default childFDs should connect std{in,out,err} properly in the spawnProcess call. We only have to make sure to handle out and log err and take note of the exitcode when done. The protocol instance will have transport
atribute assigned that will conform to IProcessTransport, we can start writing inside connectionMade callback, since then everything is set up. The writes are buffered in twisted, so we can write all at once and it will be passed to the process when it can read.
Also note that spawnProcess execs the file directly, so there is no intermediate shell, which is probably desirable in this case.
my problem was rather how to get the VT escape characters from the process. I know that if you use subprocess for example and read a processes stdout you get a string that does not contain any color info. (i tried with ls -C
).
Is this different in twisted?
ls -C
is per default ls --color=auto
, its disable color when the output is not a terminal. If I pass --color=yes
I get color escapes from ls
in subprocess
brilliant, thanks!
for most apps it's a toggle with a default of autodetection of pty. For some interesting apps try pygmentize -f console
(pygments code highlighter) and img2txt (part of libcaca).
It would be really neat if we could get some text browser render text dump in color, but I don't know of any that does that.
@ccxcz Ohhh! I'm starting to see some interesting options...
Elinks do color output, -dump-color-mode
option.
tiv
can display images better than libcaca IMHO.
you make me want that feature NOW! :)
a few things we need to do beforehand:
If we concern ourselves only with basic colors / text styles, the parsing is quite trivial and can be done by a simple state machine. The style escape sequence is "\x1b["
followed by semicolon-delimited list of styles ended by character "m"
. Numbers 30-37 are foreground colors (and are expanded to full 16 colors by the bold attribute) and 40-47 are background colors. 39 and 49 are the "default" colors of the terminal.
With xterm-style 88/256 colors there is special form of above written as "\x1b[38;5;Xm"
where X is the code of foreground color. "48;5;X"
is used for backgrounds. There are more of what xterm actually implements but I'm not sure if anybody uses that.
Full specs:
The widget (I'd rather call it AnsiColorWidget and make it generic) could be easily updated as we receive the data (no deferreds involved here if I'm remembering the api clearly, they are for fire-once events only). I can imagine this being useful when you display text and then start rendering huge attached photograph. The question is: do the ansi codes map well to urwid attributes, or will we have to do some translations? If we can specify the color just by it's code it would be great. Though I don't really know what to do when alot is invoked in 16color mode and script outputs xterm-256 style colors. (It would do so in violation of $TERM, but that's only rough pointer)
Mailcap certainly is not designed for this, it was designed only for opening attachments and it even fails to provide important information from mime to the application. In this case we would like to have atleast the same amount of information ranger gets via returncode, which is indication wheter the text is free-flown or fixed width.
I would be in favour of using single script like ranger and let it decide using case statements or whatever means available what to call. The interface could be like this (I'm using shell semantics, but we would call spawnProcess instead):
./script "text/plain; charset=utf-8" <content >colored_text 2>error_messages
This should make it simple for scripts just to call the desired application. It eliminates need for temporary file as it is with mailcap and where necessary scripts still can do ```cat >$(tempfile)""".
To make matters more interesting, we can even let users decide which one of the alternatives in multipart/alternative we want to show:
./script multipart/alternative "text/plain; charset=utf-8" "text/html; charset=utf-8" \
<text_content 3<html_content >colored_text 2>error_messages
This is bit more tricky to handle by a script, but one of the RFCs tells us they should be ordered from least representative to the richest format, so we can try to display them in reverse order, but I can imagine doing some advanced checking on whether the text message is reasonable and if not using html instead.
Body parts whose representation fails should be displayed as attachments regardless of their content-disposition, so they can be saved and examined easilly. OTOH I guess there should be a keybinding to make alot render the current attachment (eg. unified diff w/ content-disposition: attachment) if desired.
Quoting ccxcz (2012-01-27 02:12:18)
If we concern ourselves only with basic colors / text styles, the parsing is quite trivial and can be done by a simple state machine. The style escape sequence is
"\x1b["
followed by semicolon-delimited list of styles ended by character"m"
. Numbers 30-37 are foreground colors (and are expanded to full 16 colors by the bold attribute) and 40-47 are background colors. 39 and 49 are the "default" colors of the terminal.With xterm-style 88/256 colors there is special form of above written as
"\x1b[38;5;Xm"
where X is the code of foreground color."48;5;X"
is used for backgrounds. There are more of what xterm actually implements but I'm not sure if anybody uses that.Full specs:
I would prefer letting someone else™ doing the parsing here rather than implementing it in alot, but if its so easy and we don't find any code that fits our needs we might peek into urwids vterm widget code: http://excess.org/urwid/browser/urwid/vterm.py Obviously, it also does some keystroke handling which isn't necessary for us, but we should find something in there.
The widget (I'd rather call it AnsiColorWidget and make it generic) could be easily updated as we receive the data (no deferreds involved here if I'm remembering the api clearly, they are for fire-once events only). I can imagine this being useful when you display text and then start rendering huge attached photograph. The question is: do the ansi codes map well to urwid attributes, or will we have to do some translations? If we can specify the color just by it's code it would be great. Though I don't really know what to do when alot is invoked in 16color mode and script outputs xterm-256 style colors. (It would do so in violation of $TERM, but that's only rough pointer)
Urwids attributes are documented here: http://excess.org/urwid/wiki/DisplayAttributes The 256 colour mode also provides names for the 16 basic term colours, although I'm unsure if also as numbers. I vaguely remember having read that urwid, if in 16c mode and having to display a highres colour, will scale down to the nearest (whatever that means) basic colour.
Mailcap certainly is not designed for this, it was designed only for opening attachments and it even fails to provide important information from mime to the application. In this case we would like to have atleast the same amount of information ranger gets via returncode, which is indication wheter the text is free-flown or fixed width.
Well, mailcap also provides "copiousoutput" = inline = non-interactive handlers, and is the quasi-standart for this kind of thing. I'd say we should at the very least fall back to mailcap entries in case the user hasn't specified a handler. Also, we cannot provide a default set of handlers hardcoded into a config.
I would be in favour of using single script like ranger and let it decide using case statements or whatever means available what to call. The interface could be like this (I'm using shell semantics, but we would call spawnProcess instead):
./script "text/plain; charset=utf-8" <content >colored_text 2>error_messages
This should make it simple for scripts just to call the desired application. It eliminates need for temporary file as it is with mailcap and where necessary scripts still can do ```cat >$(tempfile)""".
Maybe as first choice, graciously falling back to mailcap which falls back to an error notification.
To make matters more interesting, we can even let users decide which one of the alternatives in multipart/alternative we want to show:
./script multipart/alternative "text/plain; charset=utf-8" "text/html; charset=utf-8" \ <text_content 3
colored_text 2>error_messages This is bit more tricky to handle by a script, but one of the RFCs tells us they should be ordered from least representative to the richest format, so we can try to display them in reverse order, but I can imagine doing some advanced checking on whether the text message is reasonable and if not using html instead.
I am not sure that we can simply assume multipart/alternative
contains exactly an html and an alternative plaintext
part.
Body parts whose representation fails should be displayed as attachments regardless of their content-disposition, so they can be saved and examined easilly. agreed.
OTOH I guess there should be a keybinding to make alot render the current attachment (eg. unified diff w/ content-disposition: attachment) if desired. yes, neat idea.
I thought a bit about how we should parse email in general and am still kind of puzzled how to do this:
My current state of mind is that we should let Message have a method read_mail
that is not called
upon construction but later on demand.
This method then recursively (BFS) goes through the mail, and sets Message attributes:
Only afterwards, in the BodyWidget, when we actually know the order of the inline-parts, we should go through the inline parts, pick first-choice or alternative parts depending on a widget internal property and for each one:
Now we can display the list of strings using a pimped-up urwid.Text.
comments?
Quoting Patrick Totzke (2012-01-27 11:20:23)
I would prefer letting someone else™ doing the parsing here rather than implementing it in alot, but if its so easy and we don't find any code that fits our needs we might peek into urwids vterm widget code: http://excess.org/urwid/browser/urwid/vterm.py Obviously, it also does some keystroke handling which isn't necessary for us, but we should find something in there.
The parsing, as in picking out the escape codes out of the text is indeed trivial, what is not is interpreting them and in this case this is urwid-specific.
The interesting functions here are csi_set_attr (on how attrs are updated) and sgi_to_attrspec (on how they are represented) http://excess.org/urwid/browser/urwid/vterm.py#L1004
It does seem though that the this is implemented using the low-level Canvas interface instead of high-level widgets, which is the right approach in this case I guess.
Also, we cannot provide a default set of handlers hardcoded into a config.
We can provide default script:
#!/bin/sh
FILE="$( mktemp )" || exit 1
run-mailcap --action=view "${1##;*}" "$FILE"
RETURN=$?
rm $FILE
exit $?
Or we can implement that using python if you don't like relying on run-mailcap. Either way, I'm not saying we should scrap mailcap entirely, just that we can be more flexible with simpler interface.
To make matters more interesting, we can even let users decide which one of the alternatives in multipart/alternative we want to show:
./script multipart/alternative "text/plain; charset=utf-8" "text/html; charset=utf-8" \ <text_content 3<html_content >colored_text 2>error_messages
This is bit more tricky to handle by a script, but one of the RFCs tells us they should be ordered from least representative to the richest format, so we can try to display them in reverse order, but I can imagine doing some advanced checking on whether the text message is reasonable and if not using html instead.
I am not sure that we can simply assume
multipart/alternative
contains exactly an html and an alternative plaintext part.
I was not assuming that, you can see that the full content-types are passed for each part and there is no limit for them except the amount of filedescriptors available or the length of argv, which is pretty theoretical. :-)
But maybe this is indeed an overkill and complication, I would still love to see at least a customizable python hook for which parts of multipart/alternative should be displayed.
If we do remove the handling of multiple parts at once, we can simplify
the call to be just: ./script mimetype charset_if_present
Another question is what charset to expect from the command, the default system encoding might be the sanest choice.
I thought a bit about how we should parse email in general and am still kind of puzzled how to do this: My current state of mind is that we should let Message have a method
read_mail
that is not called upon construction but later on demand.
And probably cache the results in some way.
This method then recursively (BFS) goes through the mail, and sets Message attributes:
BFS sounds weird to me, if you want the order the parts how they are in the file that would be DFS.
- signature and signed-part
- encrypted part/ crypto meta (in this case the method should stop recuring until someone replaces the underlying mail with a decrypted one)
- if multipart/alternative and plaintext and html part in the payload -> add the pair (html, alternative) to an inline-accumulator
- if unknown parts, wrap the part in an Attachment object and add it to the attachments accumulator
- recurr in multiparts
Only afterwards, in the BodyWidget, when we actually know the order of the inline-parts, we should go through the inline parts, pick first-choice or alternative parts depending on a widget internal property and for each one:
- reserve a 'Loading' Text in the list of displayed strings,
- call a handler, set a callback that updates the string in the list,
Why do we need to wait? or more precisely why should the order of inline parts should be different from DFS?
Now we can display the list of strings using a pimped-up urwid.Text.
My idea would be to to assemble the list of parts in the file using DFS, so we would get the the order we would see in text editor. Optionally we could call some kind of stable sort on them so we move attacments to the begining if desirable. Every part then would have two toggleable representations (like headers do atm) one-line that is used for attacments by default and expanded one, with content rendered by script. This representation would be rendered on demand when expanded and cached in memory.
comments?
Here you go :-)
Quoting Patrick Totzke (2012-01-27 11:20:23)
I would prefer letting someone else™ doing the parsing here rather than implementing it in alot, but if its so easy and we don't find any code that fits our needs we might peek into urwids vterm widget code: http://excess.org/urwid/browser/urwid/vterm.py Obviously, it also does some keystroke handling which isn't necessary for us, but we should find something in there.
The parsing, as in picking out the escape codes out of the text is indeed trivial, what is not is interpreting them and in this case this is urwid-specific.
The interesting functions here are csi_set_attr (on how attrs are updated) and sgi_to_attrspec (on how they are represented) http://excess.org/urwid/browser/urwid/vterm.py#L1004
It does seem though that the this is implemented using the low-level Canvas interface instead of high-level widgets, which is the right approach in this case I guess.
Also, we cannot provide a default set of handlers hardcoded into a config.
We can provide default script:
#!/bin/sh
FILE="$( mktemp )" || exit 1
run-mailcap --action=view "${1##;*}" "$FILE"
RETURN=$?
rm $FILE
exit $?
Or we can implement that using python if you don't like relying on run-mailcap. Either way, I'm not saying we should scrap mailcap entirely, just that we can be more flexible with simpler interface.
To make matters more interesting, we can even let users decide which one of the alternatives in multipart/alternative we want to show:
./script multipart/alternative "text/plain; charset=utf-8" "text/html; charset=utf-8" \ <text_content 3<html_content >colored_text 2>error_messages
This is bit more tricky to handle by a script, but one of the RFCs tells us they should be ordered from least representative to the richest format, so we can try to display them in reverse order, but I can imagine doing some advanced checking on whether the text message is reasonable and if not using html instead.
I am not sure that we can simply assume
multipart/alternative
contains exactly an html and an alternative plaintext part.
I was not assuming that, you can see that the full content-types are passed for each part and there is no limit for them except the amount of filedescriptors available or the length of argv, which is pretty theoretical. :-)
But maybe this is indeed an overkill and complication, I would still love to see at least a customizable python hook for which parts of multipart/alternative should be displayed.
If we do remove the handling of multiple parts at once, we can simplify
the call to be just: ./script mimetype charset_if_present
Another question is what charset to expect from the command, the default system encoding might be the sanest choice.
I thought a bit about how we should parse email in general and am still kind of puzzled how to do this: My current state of mind is that we should let Message have a method
read_mail
that is not called upon construction but later on demand.
And probably cache the results in some way.
This method then recursively (BFS) goes through the mail, and sets Message attributes:
BFS sounds weird to me, if you want the order the parts how they are in the file that would be DFS.
- signature and signed-part
- encrypted part/ crypto meta (in this case the method should stop recuring until someone replaces the underlying mail with a decrypted one)
- if multipart/alternative and plaintext and html part in the payload -> add the pair (html, alternative) to an inline-accumulator
- if unknown parts, wrap the part in an Attachment object and add it to the attachments accumulator
- recurr in multiparts
Only afterwards, in the BodyWidget, when we actually know the order of the inline-parts, we should go through the inline parts, pick first-choice or alternative parts depending on a widget internal property and for each one:
- reserve a 'Loading' Text in the list of displayed strings,
- call a handler, set a callback that updates the string in the list,
Why do we need to wait? or more precisely why should the order of inline parts should be different from DFS?
Now we can display the list of strings using a pimped-up urwid.Text.
My idea would be to to assemble the list of parts in the file using DFS, so we would get the the order we would see in text editor. Optionally we could call some kind of stable sort on them so we move attacments to the begining if desirable. Every part then would have two toggleable representations (like headers do atm) one-line that is used for attacments by default and expanded one, with content rendered by script.
comments?
Here you go :-)
Quoting ccxcz (2012-01-29 12:20:39)
The interesting functions here are csi_set_attr (on how attrs are updated) and sgi_to_attrspec (on how they are represented) what about parse_csi, parse_osc, parse_escape and parse_noncsi?
It does seem though that the this is implemented using the low-level Canvas interface instead of high-level widgets, which is the right approach in this case I guess.
Yes, but once we have the urwid.AttSpec objects for the text blobs, we can simply use urwid.Text, which accepts a list of pairs: [(attribute, text),..]
Also, we cannot provide a default set of handlers hardcoded into a config. We can provide default script:
good point. I'd prefer doing it in python but thats a minour issue. But what exactly would these formatter scripts need apart from the input? Because the size of the terminal is something they can read from the term themselves..
To make matters more interesting, we can even let users decide which one of the alternatives in multipart/alternative we want to show: But maybe this is indeed an overkill and complication, I would still love to see at least a customizable python hook for which parts of multipart/alternative should be displayed.
I intend to store a pair (content, plaintext-alternative) for all inline-parts in the message. this would allow us to easily swap the displayed variant in the widget, similar to toggleheaders atm.
Another question is what charset to expect from the command, the default system encoding might be the sanest choice.
I guess so. this can somehow be read from the urwid mainloop iirc.
My current state of mind is that we should let Message have a method
read_mail
that is not called upon construction but later on demand. And probably cache the results in some way.
sure. the message will store the resulting strings internally; my current WIP code uses self._attachments (stores Attachment objects) and self._inlines (stores triples (content-type, content, alternative) the latter two as unicodes). And the read_mail method will only do the work once and fall back to the cached stuff. There should be some overide though: e.g. for newly decoded crytomails.
BFS sounds weird to me, if you want the order the parts how they are in the file that would be DFS.
you're right, DFS it is.
Only afterwards, in the BodyWidget, when we actually know the order of the inline-parts, we should go through the inline parts... Why do we need to wait?
I want to separate the parsing of the email in a Message from the code that interprets the content (via external handlers). I think the interpretation stuff should be done in the widget, using deferred-magic and whatnot. This keeps Message more generic and would possibly make things easy fo someone who just wants to use alot's database wrappers only.
Now we can display the list of strings using a pimped-up urwid.Text. My idea would be to to assemble the list of parts in the file using DFS, so we would get the the order we would see in text editor.
agreed.
Optionally we could call some kind of stable sort on them so we move attacments to the begining if desirable.
Not necessary if we have parsed the Attachments beforehand
Every part then would have two toggleable representations (like headers do atm) one-line that is used for attacments by default and expanded one, with content rendered by script.
This would mean a toggle command could look up the currently selected part and only work on this. I was more thinking of a (message) global toggle that flips all parts to alternate. I'd prefer to implement the simple solution first :)
This representation would be rendered on demand when expanded and cached in memory.
Right. by the widget, not the Message.
note the preliminary code in branch rewrite-mimeparse
.
Maybe you're right and the body text interpretation should be done in the Message after all: I'm thinking of ReplyCommand, that reads Message.bodytext atm. It would feel strange to let that extract the parsed texts from the selected widget would it not? still that might be cleaner than letting Message depend on twisted and the mimehandler infrastructure..
Quoting Patrick Totzke (2012-01-29 14:56:44)
Quoting ccxcz (2012-01-29 12:20:39)
The interesting functions here are csi_set_attr (on how attrs are updated) and sgi_to_attrspec (on how they are represented) what about parse_csi, parse_osc, parse_escape and parse_noncsi? I don't think other types of escape sequences concern us.
It does seem though that the this is implemented using the low-level Canvas interface instead of high-level widgets, which is the right approach in this case I guess. Yes, but once we have the urwid.AttSpec objects for the text blobs, we can simply use urwid.Text, which accepts a list of pairs: [(attribute, text),..] I'll leave that decision to you. Canvas is probably better when we have fixed layout output (like an image), sometimes we'll want a flown text though and for that widgets are certainly more useful.
WRT attachments, message parts and alternatives: I would very much prefer if the distinction between attachments and non-attachments weren't hardcoded. After all, it's just default requested representation. I can imagine several cases where the value chosen adherently to the standards would be just wrong. Images can't be really displayed in full quality inline. Textual parts like patches can be sent with either disposition and we might want to display them as either, depending on user's wish. Multipart/alternative do not necessarily have only two parts and neither has one of them be plaintext.
We can either go down the usual route mail readers go, which is implementing the usual patterns the mails are formed, or we can go the simple way and present the parts as they are in the message with toggleable representation set to reasonable value by default. I need to quote my friend that first thing he asked after me reccomending alot to him was "Does it display MIME structure properly?". I have to say I don't know of any reasonable mail reader that does just the simple thing and doesn't try to be smart about it.
Displaying each part in separate widget seems to me as flexible and simple as it gets, though the code might need some time/manpower to get there.
As for the information passed to the display script: ranger seems to pass filename, window width and window height as parameters (the window is non-scrollable there). We certainly should pass at least the width of the widget there and the caching mechanism should be smart enough to refresh it only when the output is not indicated as fixed-width by it's returncode.
1) Canvas vs Flowwidgets: totally minor point atm, If it comes to it, i'll stick to Texts for the moment. later improvements are of course possible.
2) Having the flexibility of not hardcoding the distinction between attachments and inlines is a really nice idea but mostly theoretical and a feature request for the mid-distant future i believe. Problems:
So ATM, "being smart" about what's inline and whats not is the only option i see really. You'd also have to do a similar thing to determine the default folding in the flexible approach.
3) passing widget width to handlers: I don't see how this is supposed to work: If the width changes, urwid calls widget.render(). This is about the only time you'd ever know the size of the widget yourself. But render needs to return immediately, so no deferreds here! (thus no calling of handlers).
I forgot to mention: We seem to be about the only ones to be interested in urwids Tree widgets. so if you'd really like to help out here and don't want to do email stuff: fix those :)
Quoting Patrick Totzke (2012-01-30 10:31:40)
2) Having the flexibility of not hardcoding the distinction between attachments and inlines is a really nice idea but mostly theoretical and a feature request for the mid-distant future i believe.
As long we don't do anything that makes it hard having it in the future, I don't mind not having it right now. :-)
3) passing widget width to handlers: I don't see how this is supposed to work: If the width changes, urwid calls widget.render(). This is about the only time you'd ever know the size of the widget yourself. But render needs to return immediately, so no deferreds here! (thus no calling of handlers).
I don't see how this prevents calling renderer, we need to wait for it when viewing it for the first time too. In this case we can display clipped/smaller previous version until the proper one is generated. Now let's see what types of content we can get:
1) plain text: the regular text messages, attachments with source code, patches et cetera. We can handle line wrapping ourselves, so it is actually independent of the size of the display widget. The script only adds some fancy colors.
2) content rendered "to fit": HTML, troff, other markups where the layout is set to some fixed column count. This includes ansi-vt representation of images.
3) fixed size content: There might be cases where we don't want to wrap the lines, but the content doesn't change with regard to allocated width.
The problem is that urwid breaks if you change the sixe of a widget while it is being rendered i.e. in w.render
.