Most of the regexps used for parsing HTML have been moved to hand coded cython code. Only attribute parsing (which is only executed when needed) is being parsed right now with regexps.
Benchmarks say that the new code is 3x faster (typical parse speed moved from 60ms to 30ms per page).
Most of the regexps used for parsing HTML have been moved to hand coded cython code. Only attribute parsing (which is only executed when needed) is being parsed right now with regexps.
Benchmarks say that the new code is 3x faster (typical parse speed moved from 60ms to 30ms per page).