jlaurens / synctex

Synchronization for TeX
MIT License
57 stars 19 forks source link

Optimizing synctex edit by using an array instead of a linked list for tag lookup #69

Open user202729 opened 4 months ago

user202729 commented 4 months ago

This is a profiling result on a relatively large PDF file (1000 pages, with around 500 tags):

image

A synctex edit command takes 1.5 seconds on the file.

According to the benchmark result, most of the time is spent on searching for the input node corresponding to a tag, which is done by a linked list traversal.

https://github.com/jlaurens/synctex/blob/2020/synctex_parser.c#L6334-L6342

Because the output tags are numbered sequentially by the engines, this can be changed to use an array instead.

By a rough estimation, this improvement would speed up the relevant part by an order of at least 50, which results in approximately 25-30% overall reduction in runtime.

jlaurens commented 4 months ago

I agree. The question is how to implement that in POC? Initially no smart C library was available in TeXLive, but nowadays we have at least lua.

user202729 commented 4 months ago

I don't think it would be too difficult to implement it (but it would to require quite a bit of work), basically all we need is a resizable array, which can be implemented with just malloc.

user202729 commented 4 months ago

I implemented a proof of concept: https://github.com/user202729/luatex/commit/0831bb66f6591a1e609bbbac04c6fecb208ee824

Caveat: I don't really understand why the source code and ownership system need to be that complicated (there's a signaling system to free the nodes?), so instead of removing the linked list entirely, I just put the array in addition to the linked list.

In theory, it should be possible to remove the double-indirection entirely and make a contiguous array of synctex_node_s, which should improve memory locality and performance.

For me, this indeed shows a ≈ 33% improvement in performance. (from 1.5s to 1s)