Open prajgujarathi opened 6 years ago
Can we define "too much time" and "simple zinc file"? Are we talking milliseconds, seconds, hours, days? Is the file 100 bytes, 100kB, 100MB? Also, which version of hszinc
are you using? What Python environment? Is this running on an under-clocked 386SX/16MHz or an over-clocked Core i7/Rysen 7?
I'll admit I've done no real performance analysis on hszinc
when dealing with ZINC grids, focussing on getting it working at all.
There was a re-write recently that replaced parsimonious
with pyparsing
as the underlying parsing workhorse: this was done because the grammar needed for the former was getting very complicated for maintenance purposes, and adding list support was made nearly impossible.
I tried numerous variations, but couldn't reliably get lists to parse correctly using parsimonious
due to the recursive nature of the grammar required. Hence, I moved it over to pyparsing
. I don't know if this made performance "better" or "worse": this was not a consideration at the time.
I'm open to reviewing patches that address the performance issues on the proviso they don't break existing functionality.
If performance is a consideration, I'd seriously consider using JSON for grid representation as that uses the standard json
module in Python, and thus should have fewer performance issues.
Hi Stuart, thanks a lot for replying. About the size of a file for 3000kb zinc file, it takes about 10mints on core i7 processor. As you suggested using JSON format is quite faster than Zinc parsing. So, for now, I am going to use JSON for grids.
Ahh okay, yeah, 3MB is huge. My testing has been with grids that are more like 3kB. :-)
ZINC is challenging as while it is CSV-based, it is different enough from CSV that using a standard CSV parser will not yield reliable parsing results. Ultimately I'd like to speed things up with a C extension, but that can be a problem on some platforms without a C compiler and libraries. I figure "get it working right", then "get it working fast".
Soon as I get issue #8 finished, I'll have a "complete" ZINC parser with unit tests: and I can start re-implementing parts for speed as I'll have the unit tests to tell me whether I've accidentally broken something or not.
Making some test comparing JSON to ZINC on Niagara4 and can tell it is slow. Too slow infact. For now on, I'll switch Niagara stuff to JSON by default.
hszinc.parse(file_) take too much time to parse a simple zinc file. Is there a way to improve the performance of that function.