Open gianpaj opened 11 years ago
It would probably have to be multiprocessing rather than multi-threading because of Python's GIL, if I remember correctly.
So the idea would be to spread parsing of log lines over all available cores with separate processes? They can't have shared memory so they'd need to communicate the results back to the main process somehow. We need to test how that would be done efficiently.
I remember doing some multi threading in Python and after getting some over some syntax problems, I managed to make it work.
ok but still, multithreading wouldn't increase performance, right? because it would still only be able to use 1 core.
I basically i tried this, but i think there are lot of changes that need to be done.
I'm trying first in mlogvis.py
but because this uses LogLine by calling it on every line of the log file, I split the file into chunks of 10000 lines and start a new process.
There are some limitations of function i'm using (multiprocessing.Pool.apply
), for example you can't use a class function, and so that i'm going to have to global variables rather than class variables (self.variable
)
I know this is horrible, but have a look if you can find anything good in this code: https://github.com/gianpaj/mtools/commit/bbd7b08d24c34c5679c55bd8aaf0c6d73a69a05b
For what it's worth, I've made mlogfilter
faster when filtering on dates, by using a binary search rather than linear. With faster, I mean instant. A search on a 400MB log file took > 10 minutes before to do a search with --from
and --to
, now it is 0.15 seconds. I think that's a nice improvement. :-)
awesome!
See also #187, this made log file parsing about 8x faster than before for most tasks. I'll still keep this open and want to give multiprocessing another shot when I get to it.
For example use multi-threadding
http://docs.python.org/2/library/threading.html http://www.tutorialspoint.com/python/python_multithreading.htm