Improve parsing (almost) empty files

jannis-baum commented 3 months ago

Close #155

jannis-baum commented 3 months ago

@tuurep: Nice, sounds like an easy problem 😄

So the thing that you observed is easy to fix, but while fixing this I found another issue that is not easy to fix😅

Files that have only 1 character, e.g. a line feed (0a) are recognized by file as arbitrary binary data (application/octet-stream). So if you create a file with Vim, save it, start using Vivify, write some stuff, delete it (up to here everything works), but then close Vivify until the server shuts down and then call viv again on the "empty" file, it will break and download the file because Vivify will think it's binary because of that one single character (the line break that Vim saved). Same thing probably as well if you just create the file, write some stuff, delete "everything", save it, and then call :Vivify.

Not sure what a good way to fix this is. Maybe check the file's content and fall back to plain text if the length is 1 byte and the character is printable...

jannis-baum commented 3 months ago

I implemented manual checking now:

If the file has 0 bytes, we always return inode/x-empty (@Tweekism this already fixes the discrepancy with file for Windows that you mentioned on #150)
If the file has 1 byte, we check if that byte is a printable character, and if so return text/plain, which will trigger the renderer that will then interpret the file as Markdown if it has a Markdown extension

tuurep commented 3 months ago

Tested and works beautifully now :+1:

Implementation looks quite painful and what an unexpected problem once again :smile: Do you think it's good & worth it?

jannis-baum commented 3 months ago

Implementation looks quite painful and what an unexpected problem once again 😄 Do you think it's good & worth it?

Yeah haha. I do think it's good & worth it. It was definitely a bug, no way to spin that around haha, so it was necessary to fix.

The implementation should be very robust. 0-byte files will always be empty and 1-byte files with a printable character can't be anything "useful" that serves some purpose that breaks when we interpret them as text files.

Performance-wise it's also fine: For the majority of cases (files bigger than 1 byte) it's the same as before (the child process and getting file stats are both outside processes executed in parallel) and for 1-byte files the complexity of reading 1 byte from a file is added which is negligible.

tuurep commented 3 months ago

Ok, sounds good!

jannis-baum / Vivify

Improve parsing (almost) empty files #156