Open PlasmaHH opened 6 months ago
I would also welcome such a feature and would be happy to contribute a PR for it.
But I think it's not a trivial change and would likely have performance implications that may not be desired.
I see two options:
Doing a first quick pass on the input to determine the max line count, then calculating the absolute values for LineRange
from the negative relative line number accordingly. A second pass would handle lines just as today. No buffering needed.
Buffering the input for multiple lines in Controller::print_file_ranges
instead of the single-line line_buffer
. The buffer size would need to be equivalent to how far from the end of the input the user wants the output (which could be very large of course - resulting in potentially unexpected memory usage). LineRange
would also need adapting to represent the relative negative value somehow and this would most likely be a breaking change.
Also, today a range with a relative negative value such as 10:-3
is already valid and translated into an absolute line range of 7:10
. With the feature discussed here, new users may expect this to translate to "from line 10 to the 3rd to last line" instead.
Alternatively, to not break the existing logic, relative negative numbers may only occur in an open-ended range, i.e. -3:
or :-3
.
I made this wrapper script, which utilizes relative negative value functionality mentioned above:
#!/usr/bin/env bash
file="$1"
show="${2:-10}"
lines=$(wc -l < "$file")
range="$lines:-$((show - 1))"
exec bat "$file" --color=always --style=plain --paging=never --line-range "$range"
The second argument is optional and by default shows 10 lines. It even works if a file has fewer lines than a show value.
But I think it's not a trivial change and would likely have performance implications that may not be desired.
Well, there is tail(1), so whatever it does should be applicable here :-)
I see two options:
Note that these options are not equivalent in presence of non-regular files. IOWs, they are not alternatives, but rather both should be implemented with the decision made depending on whether the target file is seekable.
But I think it's not a trivial change and would likely have performance implications that may not be desired.
Well, there is tail(1), so whatever it does should be applicable here :-)
I see two options:
Note that these options are not equivalent in presence of non-regular files. IOWs, they are not alternatives, but rather both should be implemented with the decision made depending on whether the target file is seekable.
I am not quite sure what being seekable or not has to do with anything, after all to determine the number of lines in usual text files one has to go through them.
Anyways for one way to implement it, at work we have a program that basically reads some binary format and for each "record" outputs one bit of text. For each of these text bits it buffers the last N ones, and once it is through the file outputs only those.
The same thing could be done here, one line per entry. The thing why I think the above mentioned "quick determination of lines and then start from there" won't really work is because in most cases syntax highlighting will be broken if it doesnt know about a bit of context there.
I am not quite sure what being seekable or not has to do with anything, after all to determine the number of lines in usual text files one has to go through them.
It has to do with whether it is possible to rewind the file after reading it once. If a file is seekable, then it is possible to scan it once to count lines, and then read again, this time running the content through pre-existing logic.
OTOH, if a file is not seekable, then you do not have this luxury and you need to buffer the lines as you read them into a "sliding window", and after you get a EOF, run the contents of the sliding window through the main logic.
An example of a non-seekable file would be a text stream piped from another command, which can be only read once for obvious reasons.
EDIT: ah, I did not read till the end, my bad.
The same thing could be done here, one line per entry. The thing why I think the above mentioned "quick determination of lines and then start from there" won't really work is because in most cases syntax highlighting will be broken if it doesnt know about a bit of context there.
Indeed, that will be a problem.
However, I wonder how much of a problem it will be in practice?
File formats that people usually want to tail
are line-oriented: logs, for instance. And if you are tailing your logs, then you very much do not want to run the entire log through the syntax highlighter — it would be prohibitively slow. This is, in fact, how I discovered this issue: I tried to bat
a log file and mindlessly hit End to scroll to the last screenful of the output in the pager — only to discover that it took ~2 minutes for bat to render a 30 MiB file!
However, I wonder how much of a problem it will be in practice?
File formats that people usually want to
tail
are line-oriented: logs, for instance. And if you are tailing your logs, then you very much do not want to run the entire log through the syntax highlighter — it would be prohibitively slow. This is, in fact, how I discovered this issue: I tried tobat
a log file and mindlessly hit End to scroll to the last screenful of the output in the pager — only to discover that it took ~2 minutes for bat to render a 30 MiB file!
For such format this is true. Maybe the most sense would be a "syntax dissector" that keeps for the past lines just so much context lines as necessary, and only when the highlighting should be done it will start that many lines back? For logfiles that would be empty, for others possibly more. I right now don't know the ways bat can highlight xml files, but I for those I like a different colour per tag depth, or for C++ etc. I like different colours for different bracked depths. etc. which would then need to keep lines from the last top level (and when you would do semantic highlighting you would need everything really)
Thanks for moving the discussion forward. 🙂
2. Buffering the input for multiple lines in
Controller::print_file_ranges
instead of the single-lineline_buffer
. The buffer size would need to be equivalent to how far from the end of the input the user wants the output (which could be very large of course - resulting in potentially unexpected memory usage).
I think the increased memory usage for buffering is justified and shouldn't be super excessive for most use cases. Even buffering 10k lines at an average line length of 1k isn't that bad.
Alternatively, to not break the existing logic, relative negative numbers may only occur in an open-ended range, i.e.
-3:
or:-3
.
Probably it makes sense to start with this. If we get demand for more changes later we can consider opening it up more then.
Buffering in all cases regardless of file seekability could help reduce the number of code paths and make maintenance easier. Probably it would be my personal choice.
About highlighting context lines, currently there is no way for bat
to know which syntaxes need context or how much. But it certainly makes sense to skip highlighting all prior lines of line oriented log file. Perhaps some mapping (even just a line oriented vs non line oriented setting) per syntax could make sense.
I opened a PR for this, looking forward to feedback.
For python users it would probably best if -r acts somewhat like the python syntax...
So for negative numbers it could count from backwards, so "tail -n 5" would be "bat -r -5:" or so and then one could display the first and last portion of a file like "bat -r :5 -r -5:"