Error messages formatting

GreyCat commented 2 years ago

Hey, I just wanted to get back to the question of error messages formatting for ksc, specifically for console output.

Around 2022-07-01, in https://github.com/kaitai-io/kaitai_struct_compiler/commit/977df8fa there was a change suggested and implemented which split messages into 2 lines for readability + adding extra empty line between messages.

There are both pros and cons to such a change: on one hand, it makes it obviously more readable to human eye without any highlighting/parsing, on the other hand, it makes it harder to parse such logs with crude command-line tools such as grep(1) or PowerShell's Select-String.

I just wanted to take this as opportunity to sit down and see how error message formatting is done in modern compilers.

gcc

/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/io_local_var.cpp: In member function ‘void io_local_var_t::_read()’:
/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/io_local_var.cpp:21:5: error: ‘m__io__raw_for_io’ was not declared in this scope
     m__io__raw_for_io = new kaitai::kstream(m__raw_for_io);
     ^~~~~~~~~~~~~~~~~
/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/io_local_var.cpp:21:5: note: suggested alternative: ‘m__raw_for_io’
     m__io__raw_for_io = new kaitai::kstream(m__raw_for_io);
     ^~~~~~~~~~~~~~~~~
     m__raw_for_io
/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/io_local_var.cpp:22:53: error: ‘L’ was not declared in this scope
     m_for_io = std::unique_ptr<dummy_t>(new dummy_t(L, i, s, t, (, ), m, _, _, i, o, _, _, r, a, w, _, f, o, r, _, i, o, this, m__root));
                                                     ^
/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/io_local_var.cpp:22:56: error: ‘i’ was not declared in this scope
     m_for_io = std::unique_ptr<dummy_t>(new dummy_t(L, i, s, t, (, ), m, _, _, i, o, _, _, r, a, w, _, f, o, r, _, i, o, this, m__root));

gcc (C++ compiler — to be more precise, cc1plus) formats core messages as $FILE:$LINE:$COLUMN: $SEVERITY: $TEXT.

However, in addition to that gcc also adds extra lines of clarification:

For every accused line, it outputs the line contents, plus it outputs ^~~~~~ sign on the line below, aligned to alleged place of the problem.
gcc tends to group problem reports in groups (in the example above, they're grouped by function), thus adding extra lines pointing just to the file + giving grouping context, such as $FILE: In member function ...:
gcc tends to give suggestions, adding extra lines with severity = "note", normally for what it considers typos, offering possible suggestions.

In all fairness, modern gcc can also output structured messages in JSON and SARIF.

ld

/usr/bin/ld: CMakeFiles/ks_tests.dir/CMakeFiles/3.13.4/CompilerIdCXX/CMakeCXXCompilerId.cpp.o: in function `main':
/home/greycat/git/kaitai_struct/tests/compiled/cpp_stl_11/bin/CMakeFiles/3.13.4/CompilerIdCXX/CMakeCXXCompilerId.cpp:580: multiple definition of `main'; CMakeFiles/ks_tests.dir/main.cpp.o:/usr/include/boost/test/unit_test.hpp:62: first defined here
/usr/bin/ld: CMakeFiles/ks_tests.dir/test_io_local_var.cpp.o: in function `test_io_local_var::test_method()':
/home/greycat/git/kaitai_struct/tests/spec/cpp_stl_11/test_io_local_var.cpp:12: undefined reference to `io_local_var_t::io_local_var_t(kaitai::kstream*, kaitai::kstruct*, io_local_var_t*, bool)'

ld (linker) messages are much more cryptic:

ld in theory sticks with $FILE:$LINE: $TEXT template.
ld doesn't really provide any indication of severity, so it's impossible to grep by error or something like that.
ld tries to add some extra context on separate lines, adding .o file reference and, for some reason, the fact that it's a message from /usr/bin/ld, along with grouping in function $FUNCTION message.
ld tries to add some extra pointers and suggestions such as first defined here

Still, summarizing, at least IMHO, it's a huge mess which makes it very hard to parse and understand how many problems are there.

clang

/home/travis/build/kaitai-io/ci_targets/tests/compiled/cpp_stl_98/io_local_var.cpp:19:72: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
    if (static_cast<io_local_var_t::dummy_t*>(mess_up())->_io()->pos() < 0) {
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~
/home/travis/build/kaitai-io/ci_targets/tests/compiled/cpp_stl_98/io_local_var.cpp:29:23: error: use of undeclared identifier 'n_mess_up'
    if (f_mess_up && !n_mess_up) {
                      ^
/Users/travis/build/kaitai-io/ci_targets/tests/compiled/cpp_stl_11/enum_to_i_class_border_2.h:14:7: note: 'enum_to_i_class_border_2_t' declared here
class enum_to_i_class_border_2_t : public kaitai::kstruct {
      ^

Similar to gcc, but actually even briefer without function-level grouping.

Template is $FILE:$LINE:$COLUMN: $SEVERITY: $TEXT

For warnings, there's also an indication of how to control that specific warning — [-Wtautological-compare]. Suggestions are emitted as "note"-level severity extra messages.

Mach-O linker (Apple's ld)

Undefined symbols for architecture x86_64:
  "io_local_var_t::io_local_var_t(kaitai::kstream*, kaitai::kstruct*, io_local_var_t*)", referenced from:
      test_io_local_var::test_method() in test_io_local_var.cpp.o
  "enum_to_i_class_border_2_t::enum_to_i_class_border_2_t(enum_to_i_class_border_1_t*, kaitai::kstream*, kaitai::kstruct*, enum_to_i_class_border_2_t*)", referenced from:
      enum_to_i_class_border_1_t::checker() in enum_to_i_class_border_1.cpp.o
ld: symbol(s) not found for architecture x86_64

Not very consistent, it looks like a list of affected items rather than a list of error messages. The error message itself is only repeated once at the very top (Undefined symbols), and it's not very obvious that it's an error. Elements in the list seem to be having 2 spaces at the beginning as a mark of the element. Then, every element can have a sublist of "referenced from", which are distinguished by 6 spaces.

IMHO, rather hard to read and parse.

javac

compiled/java/src/io/kaitai/struct/testformats/ExprMod.java:26: error: illegal start of expression
        this._root = _root == null ? this : _root +;
                                                   ^

All error messages are always one-line $FILE:$LINE: $SEVERITY: $TEXT. However, they are augmented with the copy of problematic line itself plus a caret on the extra line pointing to a column.

csc (Microsoft C# compiler)

/home/greycat/git/kaitai_struct/tests/compiled/csharp/IfInstances.cs(33,21): warning CS0162: Unreachable code detected
/home/greycat/git/kaitai_struct/tests/compiled/csharp/IoLocalVar.cs(62,17): warning CS0162: Unreachable code detected
/home/greycat/git/kaitai_struct/tests/compiled/csharp/IoLocalVar.cs(74,17): warning CS0162: Unreachable code detected
/home/greycat/git/kaitai_struct/tests/spec/csharp/kaitai_struct_csharp_tests/tests/SpecDocstrings.cs(13,17): warning CS0219: The variable `r' is assigned but its value is never used
/home/greycat/git/kaitai_struct/tests/spec/csharp/kaitai_struct_csharp_tests/tests/SpecDocstringsDocref.cs(13,17): warning CS0219: The variable `r' is assigned but its value is never used
/home/greycat/git/kaitai_struct/tests/spec/csharp/kaitai_struct_csharp_tests/tests/SpecFixedContents.cs(13,17): warning CS0219: The variable `r' is assigned but its value is never used
/home/greycat/git/kaitai_struct/tests/spec/csharp/kaitai_struct_csharp_tests/tests/SpecMetaXref.cs(13,17): warning CS0219: The variable `r' is assigned but its value is never used
/home/greycat/git/kaitai_struct/tests/spec/csharp/kaitai_struct_csharp_tests/tests/SpecProcessCustom.cs(54,21): error CS0103: The name `ProcessCustom' does not exist in the current context

C# compiler is normally ran with msbuild, which produces pretty noisy and hard to read text output, so the recommended way to tackle it is actually using binary structured logs and msbuild-structured-log-viewer.

However, if you get to the messages themselves, they're pretty brief: $FILE($LINE,$COLUMN): $SEVERITY $CODE: $TEXT. No extra lines of context or copies of problematic lines are present, at least in the default output.

cl (Microsoft C++ compiler)

C:\projects\ci-targets\runtime\cpp_stl\kaitai\kaitaistream.cpp(492): warning C4267: '=': conversion from 'size_t' to 'uInt', possible loss of data [C:\projects\ci-targets\tests\compiled\cpp_stl_98\bin\ks_tests.vcxproj]
C:\projects\ci-targets\tests\spec\cpp_stl_98\test_expr_calc_array_ops.cpp(4): fatal error C1083: Cannot open include file: 'expr_calc_array_ops.h': No such file or directory [C:\projects\ci-targets\tests\compiled\cpp_stl_98\bin\ks_tests.vcxproj]
c:\projects\ci-targets\tests\compiled\cpp_stl_98\imported_and_abs.h(8): fatal error C1083: Cannot open include file: '/imported_root.h': No such file or directory [C:\projects\ci-targets\tests\compiled\cpp_stl_98\bin\ks_tests.vcxproj]
C:\Tools\vcpkg\installed\x64-windows\include\boost/test/tools/old/impl.hpp(107): warning C4805: '==': unsafe mix of type 'const Left' and type 'const Right' in operation [C:\projects\ci-targets\tests\compiled\cpp_stl_98\bin\ks_tests.vcxproj]
          with
          [
              Left=bool
          ]
          and
          [
              Right=int
          ]
  C:\Tools\vcpkg\installed\x64-windows\include\boost/test/tools/old/impl.hpp(130): note: see reference to function template instantiation 'boost::test_tools::assertion_result boost::test_tools::tt_detail::equal_impl<Left,Right>(const Left &,const Right &)' being compiled
          with
          [
              Left=bool,
              Right=int
          ]
  C:\Tools\vcpkg\installed\x64-windows\include\boost/test/tools/old/impl.hpp(145): note: see reference to function template instantiation 'boost::test_tools::assertion_result boost::test_tools::tt_detail::equal_impl_frwd::call_impl<Left,Right>(const Left &,const Right &,boost::mpl::false_) const' being compil
ed
          with
          [
              Left=bool,
              Right=int
          ]
  C:\Tools\vcpkg\installed\x64-windows\include\boost/test/tools/old/impl.hpp(92): note: see reference to function template instantiation 'boost::test_tools::assertion_result boost::test_tools::tt_detail::equal_impl_frwd::operator ()<Arg0,Arg1>(const Left &,const Right &) const' being compiled
          with
          [
              Arg0=bool,
              Arg1=int,
              Left=bool,
              Right=int
          ]
  C:\projects\ci-targets\tests\spec\cpp_stl_98\test_bits_simple.cpp(19): note: see reference to function template instantiation 'bool boost::test_tools::tt_detail::check_frwd<boost::test_tools::tt_detail::equal_impl_frwd,bool,int>(Pred,const boost::unit_test::lazy_ostream &,boost::unit_test::const_string,size
_t,boost::test_tools::tt_detail::tool_level,boost::test_tools::tt_detail::check_type,const Arg0 &,const char *,const Arg1 &,const char *)' being compiled
          with
          [
              Pred=boost::test_tools::tt_detail::equal_impl_frwd,
              Arg0=bool,
              Arg1=int
          ]

Same story with msbuild applies. The messages themselves seem to be most of the time formatted using simple $FILE($LINE): $SEVERITY: $TEXT template. However, in case of additional information/suggestions provided, these start as note severity message, but then occupy a random number of randomly formatted lines with extra information.

I'll keep adding more examples to get a better view on how this is approached by different compilers.

Question: which one of these do you like the most and, most importantly, why?

generalmimon commented 2 years ago

@GreyCat:

on the other hand, it makes it harder to parse such logs with crude command-line tools such as grep(1) or PowerShell's Select-String.

Yes, I'm aware of that. My idea was that the default should be what is best readable by a naked human eye (because I felt that the readability of the old output format was terrible when warnings and the ability not to stop after the first problem were added in 0.10, since the 400-character lines were soft-wrapped by the terminal window, they were right above each other and it seemed almost impossible to see anything there, you didn't even know if it's an error or a warning).

I think the example in https://github.com/kaitai-io/kaitai_struct_compiler/commit/977df8fa shows well how much better the new output is (and of course in real-world specifications of complex formats, using imports and having lots of "style guide" naming violations, the difference would be even more striking).

For machine-readable purposes, there is the --ksc-json-output option. Before releasing 0.10, I made sure that even warnings get written here as well (and also fixed some other bugs), so it should be no worse than the default human-readable text output. In my opinion, there shouldn't be any reason not to use it when you need to process problems reported by KSC programmatically.

JSON also has a very good support among programming languages, usually right in the standard library.

For even easier processing of the JSON output (if one doesn't want to write these few lines of JSON parsing code themselves), https://jqlang.github.io/jq/ may be very useful. I'd say it's no more difficult to use jq than those crude tools like grep and Select-String, but the difference is that jq is actually reliable and solid option, and it's not like you're grepping the human-readable output where you have no guarantee that it will work in all cases (and in any future KSC versions, because I don't feel like the human-readable output should be treated as public API where we should maintain any compatibility) and that is just wrong from the start.

KOLANICH commented 1 year ago

For machine-readable purposes, there is the --ksc-json-output option. Before releasing 0.10, I made sure that even warnings get written here as well (and also fixed some other bugs), so it should be no worse than the default human-readable text output. In my opinion, there shouldn't be any reason not to use it when you need to process problems reported by KSC programmatically.

I mostly aggree, but there is at least 1 reason. Some text editors like Kate have modules for generic building. They just run userspecified commands, then parse errors out of them and provide jums to the places of errors by a click. Of course they can parse errors only for some formats of some widespread software, like gcc. For example while rustc errors are detected, navigation doesn't work.

Question: which one of these do you like the most and, most importantly, why?

IMHO: we need two very generic libraries, one is for formating and one is for parsing error messages, written in a language intended for transpilation into other languages, like Haxe. So a user can choose the format the generic implementation supports. Also the lib should support popular machine-readable outputs.

kaitai-io / kaitai_struct