Open GreyCat opened 2 years ago
@GreyCat:
on the other hand, it makes it harder to parse such logs with crude command-line tools such as grep(1) or PowerShell's Select-String.
Yes, I'm aware of that. My idea was that the default should be what is best readable by a naked human eye (because I felt that the readability of the old output format was terrible when warnings and the ability not to stop after the first problem were added in 0.10, since the 400-character lines were soft-wrapped by the terminal window, they were right above each other and it seemed almost impossible to see anything there, you didn't even know if it's an error or a warning).
I think the example in https://github.com/kaitai-io/kaitai_struct_compiler/commit/977df8fa shows well how much better the new output is (and of course in real-world specifications of complex formats, using imports and having lots of "style guide" naming violations, the difference would be even more striking).
For machine-readable purposes, there is the --ksc-json-output
option. Before releasing 0.10, I made sure that even warnings get written here as well (and also fixed some other bugs), so it should be no worse than the default human-readable text output. In my opinion, there shouldn't be any reason not to use it when you need to process problems reported by KSC programmatically.
JSON also has a very good support among programming languages, usually right in the standard library.
For even easier processing of the JSON output (if one doesn't want to write these few lines of JSON parsing code themselves), https://jqlang.github.io/jq/ may be very useful. I'd say it's no more difficult to use jq
than those crude tools like grep
and Select-String
, but the difference is that jq
is actually reliable and solid option, and it's not like you're grep
ping the human-readable output where you have no guarantee that it will work in all cases (and in any future KSC versions, because I don't feel like the human-readable output should be treated as public API where we should maintain any compatibility) and that is just wrong from the start.
For machine-readable purposes, there is the --ksc-json-output option. Before releasing 0.10, I made sure that even warnings get written here as well (and also fixed some other bugs), so it should be no worse than the default human-readable text output. In my opinion, there shouldn't be any reason not to use it when you need to process problems reported by KSC programmatically.
I mostly aggree, but there is at least 1 reason. Some text editors like Kate have modules for generic building. They just run userspecified commands, then parse errors out of them and provide jums to the places of errors by a click. Of course they can parse errors only for some formats of some widespread software, like gcc
. For example while rustc errors are detected, navigation doesn't work.
Question: which one of these do you like the most and, most importantly, why?
IMHO: we need two very generic libraries, one is for formating and one is for parsing error messages, written in a language intended for transpilation into other languages, like Haxe. So a user can choose the format the generic implementation supports. Also the lib should support popular machine-readable outputs.
Hey, I just wanted to get back to the question of error messages formatting for ksc, specifically for console output.
Around 2022-07-01, in https://github.com/kaitai-io/kaitai_struct_compiler/commit/977df8fa there was a change suggested and implemented which split messages into 2 lines for readability + adding extra empty line between messages.
There are both pros and cons to such a change: on one hand, it makes it obviously more readable to human eye without any highlighting/parsing, on the other hand, it makes it harder to parse such logs with crude command-line tools such as grep(1) or PowerShell's Select-String.
I just wanted to take this as opportunity to sit down and see how error message formatting is done in modern compilers.
gcc
gcc (C++ compiler — to be more precise,
cc1plus
) formats core messages as$FILE:$LINE:$COLUMN: $SEVERITY: $TEXT
.However, in addition to that gcc also adds extra lines of clarification:
^~~~~~
sign on the line below, aligned to alleged place of the problem.$FILE: In member function ...:
In all fairness, modern gcc can also output structured messages in JSON and SARIF.
ld
ld (linker) messages are much more cryptic:
$FILE:$LINE: $TEXT
template.error
or something like that./usr/bin/ld
, along with groupingin function $FUNCTION
message.first defined here
Still, summarizing, at least IMHO, it's a huge mess which makes it very hard to parse and understand how many problems are there.
clang
Similar to gcc, but actually even briefer without function-level grouping.
Template is
$FILE:$LINE:$COLUMN: $SEVERITY: $TEXT
For warnings, there's also an indication of how to control that specific warning —
[-Wtautological-compare]
. Suggestions are emitted as "note"-level severity extra messages.Mach-O linker (Apple's ld)
Not very consistent, it looks like a list of affected items rather than a list of error messages. The error message itself is only repeated once at the very top (
Undefined symbols
), and it's not very obvious that it's an error. Elements in the list seem to be having 2 spaces at the beginning as a mark of the element. Then, every element can have a sublist of "referenced from", which are distinguished by 6 spaces.IMHO, rather hard to read and parse.
javac
All error messages are always one-line
$FILE:$LINE: $SEVERITY: $TEXT
. However, they are augmented with the copy of problematic line itself plus a caret on the extra line pointing to a column.csc (Microsoft C# compiler)
C# compiler is normally ran with msbuild, which produces pretty noisy and hard to read text output, so the recommended way to tackle it is actually using binary structured logs and msbuild-structured-log-viewer.
However, if you get to the messages themselves, they're pretty brief:
$FILE($LINE,$COLUMN): $SEVERITY $CODE: $TEXT
. No extra lines of context or copies of problematic lines are present, at least in the default output.cl (Microsoft C++ compiler)
Same story with msbuild applies. The messages themselves seem to be most of the time formatted using simple
$FILE($LINE): $SEVERITY: $TEXT
template. However, in case of additional information/suggestions provided, these start asnote
severity message, but then occupy a random number of randomly formatted lines with extra information.I'll keep adding more examples to get a better view on how this is approached by different compilers.
Question: which one of these do you like the most and, most importantly, why?