Open IridescentRose opened 3 years ago
fn myConcat(allocator: *std.mem.Allocator, comptime T: type, slices: []const []const T) ![]T {
if (slices.len == 0) return &[0]T{};
var total_len: usize = 0;
for (slices) |slice| {
total_len += slice.len;
}
const buf = try allocator.alloc(T, total_len);
errdefer allocator.free(buf);
var buf_index: usize = 0;
var sliceIdx: usize = 0;
while(sliceIdx < slices.len): (sliceIdx += 1) {
var i: usize = 0;
while(i < slices[sliceIdx].len): (i += 1){
buf[buf_index] = slices[sliceIdx][i];
buf_index += 1;
std.debug.print("{c}\n",.{buf[buf_index - 1]});
}
}
return buf[0..];
}
This was my attempt to solve it. The debug print writes everything correctly and obviously buf_index is never reset to the beginning for this bug to occur.
Odd bug
It's not a bug, it's a Windows feature :) readUntilDelimiterAlloc
is reading until a \n
is found... but Windows uses \r\n
for its newlines! When you read in myPackage.zig
what you really get is myPackage.zig\r
and when the full path is built (ps. use fs.path.joinSep
) you get something like packages/myPackage.zig\r.json
.
Whenever you print the string to a terminal the \r
shifts the cursor back to the beginning of the line, that's why the .json
part is misplaced.
That both explains it and also makes it more confusing. Perhaps we could create a new method like readUntilNewLineAlloc
or even more simply readLineAlloc
that handled it on a per-platform basis for simplicity?
This is why Zig needs a {q}
format specifier. 😁
Cross-reference to https://github.com/ziglang/zig/issues/6756.
Perhaps we could create a new method like readUntilNewLineAlloc or even more simply readLineAlloc that handled it on a per-platform basis for simplicity?
I like the idea of a readLineAlloc
(what about readLineArrayList
? and readLineOrEof
?) but it should either recognize and chop away both CRLF and LF sequences or take an explicit parameter to control this behaviour, you can have files with LF endings on Windows and files with CRLF endings on Linux.
I think detection should be the main feature since you could make it explicit with the existing untilDelimiter call
I think detection should be the main feature since you could make it explicit with the existing untilDelimiter call
That's why I suggested it should remove both CRLF and LF endings, this way you get a simple method that reads a platform-independent line.
And readUntilDelimiterAlloc
accepts a single byte as delimiter, reading CRLF requires you to read the line and cut off the trailing CR.
There's already a comptime delimiter constant in the standard library that could be used to drive a readLineAlloc or whatever
40+ years of experience with handling text files cross-platform speaking here: "On all platforms, the standard read-a-line functions should always consider CR, LF, and CRLF to be line terminators (and yes, that means that if you see a CR, also eat a single LF if one appears immediately after it). Also, keep in mind that there might not be any of these characters at the end of the last line in a text file." Doing it this way means that all programs will automatically handle text files that have been transferred from disparate OSes by disparate means.
Now, get off my lawn.
should always consider CR, LF, and CRLF to be line terminators
It was before Unicode era. :) From https://en.wikipedia.org/wiki/Newline#Unicode:
The Unicode standard defines a number of characters that conforming applications should recognize as line terminators: LF: Line Feed, U+000A VT: Vertical Tab, U+000B FF: Form Feed, U+000C CR: Carriage Return, U+000D CR+LF: CR (U+000D) followed by LF (U+000A) NEL: Next Line, U+0085 LS: Line Separator, U+2028 PS: Paragraph Separator, U+2029
Indeed! Note that the Unicode list of line terminators is a strict superset of mine, including the CR/LF/CRLF complication. And, hey, even back in the day we had FF (which was seldom seen, and typically caused a "clear screen" operation when displayed, and an "eject page" when printed), and even VT (even more infrequent in the wild, and nobody knew what it was really supposed to do, particularly as differentiated from LF). But NEL, LS, PS? You kids these days, with your fancy emo-gees.
But NEL, LS, PS? You kids these days, with your fancy emo-gees.
E.g., D supports them.
I found this out while trying to use ZPM to add a package. This is not a ZPM bug as I found, but rather a bug in Zig's std.mem.concat.
Below is a simplified version of the error
Input:
myPackage
Output on Ubuntu 20.04:packages/myPackage.zig
Output on Windows:.jsonges/myPackage.zig
This means that the ".json" is overwriting the beginning of the buffer. What's weirder is that when iterating through the entire buffer when copying, every character is printed as the correct string
packages/myPackage.zig
but the returned result is the other bugged version.