tylerlong / google-code-prettify

Automatically exported from code.google.com/p/google-code-prettify
Apache License 2.0
0 stars 0 forks source link

Highlighting for unified diff #62

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
It would be nice if the unified diff format was correctly highlighted.

Please let me know if you need more information on this.

Original issue reported on code.google.com by alexkon on 26 Dec 2008 at 7:55

GoogleCodeExporter commented 9 years ago
Are you talking about the output of diff -u?

Original comment by mikesamuel@gmail.com on 6 Jan 2009 at 10:59

GoogleCodeExporter commented 9 years ago
Yes, that's right.

Original comment by alexkon on 7 Jan 2009 at 8:21

GoogleCodeExporter commented 9 years ago
Hmm.

I can probably guess the language semi-reliably by looking at the file 
extension on
the first two lines of the diff, but chunks are harder.

Trying to highlight a chunk when you might be in the middle of a multi-line 
token is
tough.

I might be able to use some heuristics, such as
(1) If there is a '*/' or '-->' before the first of any '/*' or '<!--', assume 
you're
in a C-style or SGML style block comment.
(2) If the first token is of the form > or xml:identifier= then assume you're 
inside
a tag.
(3) If there is a </script or </style before any <script or <style, then assume
you're in an embedded source block.

I don't know how well those would work in practice, and it doesn't fit into the
current language handler scheme.  I could extend language handlers to provide 
for a
heuristic which returns a piece of invisible text to process to set up the state
properly, so the heuristic for C might look like:
  return (/^(?:[^/]|\/+[^*/])*\*\//.test(chunk) ? '/*' : '';

The problem of actually processing a chunk is harder.
I think the best way to do that would probably be to reconstitute the original 
and
changed versions, apply decorators, and then use the changed version for all 
text not
in a subtracted line, and use the original version for the subtracted lines.
I think it's appropriate to bias towards added lines when the tokenization 
after a
run of additions and subtractions would differ since, hopefully, a change is 
more
likely to fix syntactic errors in code than to introduce them.

Original comment by mikesamuel@gmail.com on 9 Jan 2009 at 8:02

GoogleCodeExporter commented 9 years ago
Hi. I don't know how exactly it works - but maybe this coud be easier done when
you use something like "s|^\([+].*\)|<span style=\"color: green;\">&</style>|"
and using red for for ^[-] line .. insted of tokenizing.

Also there is @@ lines and stuff which splits patches to files ---,
which could be harder maybe.

So .. my point - wouldn't be easyier to write parser from scratch, or as 
post_filter()
after auto-highliting??

Would be grate to have diff support.

Thank you, Jakub V.

Original comment by Main...@gmail.com on 18 Jun 2010 at 3:31