Closed toddlipcon closed 4 years ago
@toddlipcon, thanks for another great feedback! I just want to make sure that I understand what you are talking about correctly. Are you talking about this like, right?
https://github.com/yhirose/cpp-peglib/blob/c0d2318b1582b0679618d53ede8bf9c98d7f06c8/peglib.h#L3348
I am thinking to cache previous results, so that a next call can start from the last processed position instead of the start of text.
Yep, that is the line that is showing up in my profile (the loop scanning for '/n' inside that function, in particular).
Caching previous results could work but may be complex from a code flow perspective.
@toddlipcon, I decided to simply drop line
and column
members from AstBase
. So it's now a user's responsibility to calculate line and column numbers.
@toddlipcon, I realized that some my apps depend on the line information on the Ast node... So I reverted my change and reimplemented with the line end index (std::vector<size_t> source_line_index
) and binary search (std::lower_bound
). If you think that it's a still bottle neck, I'll consider getting rid of them again.
In my workload which uses enable_ast, I noticed that the constructor of line_info is a perf hotspot. Each of the AST-generating rules creates a line_info struct to pass into the AstNode, even though in most cases (successfuly parses) the line info will never be used. The actual construction of line_info scans from the beginning of the parse input, which means parsing is O(num_tokens * number of characters). Considering that the number of tokens is probably linear with respect to the number of characters, this is O(n^2) overall.