microsoft / Trieste

A term rewriting system for experimental programming language development.
MIT License
38 stars 20 forks source link

Let Location::str() handle `\r` correctly #127

Closed fhackett closed 2 months ago

fhackett commented 3 months ago

This PR is split off from #126, because really it's an orthogonal fix.

The observed problem was that, on Windows, error messages had a strange \r in reported source locations that was inconsistent with both other platforms, and other parts of the same output string (you would see \n once then \r\n further along the same string).

The trouble was traced back to source.h, which was separating lines based on only the \n character, and assuming that line breaks were only 1 char wide. This PR rewrites the 3 line/columns handling functions that did this, so that they will accept \r\n, \n, and \r as line delimiters while reporting correct line lengths (and so forth) on all platforms.

The main change of substance is that the lines field of SourceDef changes from a vector of size_t to a vector of pair<size_t, size_t>, because it is no longer possible to derive where a line ends from where another line begins (we might have to add/subtract either 1 or 2 chars). Rather than try to detect some file-wide line break width (and possibly have problems due to corrupted files mixing line ending types), this rewrite just looks at which line ending variant each line had and records span information to match.

Additionally, this PR adds exhaustive generative testing of this changed feature. The test code is commented to explain logically how it builds all possible permutations of line endings and line contents up to a certain length (albeit using just a as contents since this code is not sensitive to string contents). As configured in the CMakeLists.txt, it runs 729 test cases up to length 6 (6 chars, 6 line endings, and everything in between).

This fine-grained testing should help keep this code robust if it ever needs to change again - in fact it already helped catch errors when implementing some review recommendations from the other PR this was split from.

fhackett commented 2 months ago

Based on meeting feedback, this new version hardens Location::str() against carriage returns instead.

Because I was exhaustively testing and found a lot of edge cases, I rewrote most of Location::str() in order more fundamentally satisfy the testing requirements. That said, I am relatively sure that, if you don't use \r, the old and new implementation are exhaustively equivalent in their behavior. It can be double checked by reverting my source.h changes and running the tester with --skip-carriage-returns.

Let me know if there are any questions/clarifications/requests.