Closed thestr4ng3r closed 3 years ago
What about not processing the last line if it doesnt contain a newline? imho all lines in a text file should end with a newline, we can consider this case invalid, and avoid the malloc+1 && memcpy
What about not processing the last line if it doesnt contain a newline? imho all lines in a text file should end with a newline, we can consider this case invalid, and avoid the malloc+1 && memcpy
I think not having a newline after the last line is a valid case, many text editors won't put it unless explicitly configured. I also wouldn't want to make the format unnecessarily strict just because our parser code can't handle it.
ping @trufae
ping @trufae
Don't stress @thestr4ng3r . only 3 days have passed since you sent the PR, i have wait months for a single review and I have been (and I'm still) very busy at work and at home. Now I have some time to check this.
I was needing this time basically to try your code and understand the ctx.path loop because it seems to me that just registering a line to be processed takes so many lines. So here are some comments:
The code works, but looks a bit overengineered because instead of processing line by line it's processing char by char, so at the end .. if it finds out the buffer contains no final newline byte it needs to reshift all the line context. But i can't think of a better way, unless the ctx contains the actual shifting information so it wont require to touch all the pointers when this happens. but as long as this is an exception it is better to stick to this solution i think.
So sumarizing:
Then looks good to merge for me
Oh! Forgot to add this to the summary:
This code only makes sense when building sdb with USE_MMAN, right?
No. I tried to explain this in the pr description but the short version is this:
sdb_text_load_buf(Sdb *s, char *buf, size_t sz)
when the buf
is of size exactly sz
.Hence, the following change would be wrong:
wrap this code inside USE_MMAN
For the other three points, I pushed commits.
This happens because
load_flush_line()
will write a zero-byte just after the line. Most of the time, this just overwrites the\n
but at the very end of the file, it writes one byte after the allocated buffer. With mmap this usually stays undetected because the whole page is allocated, but with mmap disabled and asan it shows even in the current tests. There would be different ways to fix this:sdb_text_load_buf()
: If done directly like this, a user of the function would have to be aware of that and it goes against everything we are used to when we have an additional size argument. Alternatively the string may always be forced to be zero-terminated and the size removed, but that would imply a lot of additional checks and issues, or a strlen over the possibly huge buffer so this isn't really a nice solution eitherload_flush_line()
not write this zero-byte there: In this case, it would have to do a strdup in every single line to feed the value intosdb_set()
, which would be a huge additional overhead.load_flush_line()
work on that copy. This is what I did here. It's a bit ugly but the solution with the least problems and it is covered well by tests.