We're seeing some parser inefficiencies while preparing for the xcsp3 competitition with a CPMpy bridge.
E.g. there are some instances with odd use of Table constraints that have very few table constraints but with huge (constant) tables, e.g. the very first MiniCOP file: MiniCOP/AircraftLanding-table-airland01_mc22.xml
The parser takes 25 seconds, ortools' type checker takes another 12 and the solver only 9 seconds, so more time in the parser then anything else...
Looking at the parser code, I see a number of generic optimisations that can probably bedone to the parser throughout:
you should compile your regular expressions once and use them everywhere... this will make a difference I now just do it at the top of this file for this one
'starred' is much easier checked with python's built-in string checker than through loops
the choice between functions was in a list comprehension in a loop, but the function reference can be determined once upfront and put in a local variable, this gives tiny double gains that can add up
with those changes (e.g. everything above 'else') parsing now takes 19 seconds
to get it more faster, we have to use some external optimized implementation; e.g. if we convert the string to a csv formatted string, we can use numpy's loadtxt as well as its efficient tolist(). This parses that file in 10 seconds on my computer.
Since we intend to use the parser in the competition, I hope you can agree it is worth optimising some of its bottlenecks?
Hi,
We're seeing some parser inefficiencies while preparing for the xcsp3 competitition with a CPMpy bridge.
E.g. there are some instances with odd use of Table constraints that have very few table constraints but with huge (constant) tables, e.g. the very first MiniCOP file: MiniCOP/AircraftLanding-table-airland01_mc22.xml
The parser takes 25 seconds, ortools' type checker takes another 12 and the solver only 9 seconds, so more time in the parser then anything else...
Looking at the parser code, I see a number of generic optimisations that can probably bedone to the parser throughout:
with those changes (e.g. everything above 'else') parsing now takes 19 seconds
to get it more faster, we have to use some external optimized implementation; e.g. if we convert the string to a csv formatted string, we can use numpy's
loadtxt
as well as its efficienttolist()
. This parses that file in 10 seconds on my computer.Since we intend to use the parser in the competition, I hope you can agree it is worth optimising some of its bottlenecks?