Open devkev opened 10 years ago
You're right. Probably storing the line number is more than enough. When a point is clicked then just open the file on that line number. Not sure if linecache could work here
Thanks, I like the linecache idea, that sounds like the way to go.
Some notes:
use namedtuples to only store the fields needed, which are:
line_no
(use linecache to get back line for click event)datetime
(x-axis value)duration
(possibly, for --optime-start
flag)self.field
(y-axis value)group
(see below)Issue with grouping. The grouping is currently a function that takes the logevent, and calculates group dynamically. Instead, pre-calculate group value in add_line()
(it doesn't change during the lifetime of a single plot_instance), and add and additional field group
to the tuple.
What about stdin? Need to additionally store line_str
.
stdin is just a special case of a file that can't be seeked. It's also possible to have such a file passed on the command line (eg. using bash's "<()" construct, or using mkfifo).
I would suggest the following approach. Change the rest of the code to not store line_str
, but rather the line number of the file, which is used as an indirect reference back into the file. Define an abstract "Logfile" class. This has 3 actual implementations, each of which are tried to be used in turn:
The other approach to dealing with non-seekable files is to cache them into a temporary disk file somewhere, somehow. I dislike this idea, because it means that it becomes mtools's problem as to find a writable location with sufficient disk space to put the temporary file(s), and to clean them up later (which isn't always possible, eg. kill -9). I much prefer the policy that if you have output from a pipe that you want to plot, and it's "large" (as defined by the maximum CachedLogfile cache size above), then it's your job to pipe it into a file and then feed that file to mplotqueries. This pushes the decision of finding a writable location with enough space, and cleaning up the file afterwards, onto the user, but I don't mind that because the user is far better informed than mtools in these regards.
mplotqueries stores the original log line along with the parsed info, so that it can output it when points are clicked. However, it would be a lot better to instead store a filename + byte offset (where possible, ie. when reading from a rewindable and/or seekable file), to avoid eating up impossible amounts of memory on very very large logfiles.
Alternatively, when the logfile is an actual file (ie. not a pipe), it could be mmapped, which would potentially allow for faster reading (especially when plotting and replotting the same file over and over), and fast/easy access back to the original log lines without having to use lots of ram.