serokell / xrefcheck

Check cross-references in repository documents
Mozilla Public License 2.0
55 stars 3 forks source link

Parallelize file parsing #247

Open YuriRomanowski opened 1 year ago

YuriRomanowski commented 1 year ago

Clarification and motivation

This topic is a part of #221. After we read file contents, we should perform parsing of the files, which is (in theory) pure action and can be parallelized. But we use C library under the hood, so the parallelization may be tricky. Here we can try some approaches and discuss results.

Acceptance criteria

YuriRomanowski commented 1 year ago

I uploaded some commits where different variations of xrefcheck can be load-tested (in branch YuriRomanowski/#247-parallelize-file-parsing-scaffolding):

Martoon-00 commented 1 year ago

Thanks for this investigation!

I tried, and from what I can see:

(the selected area corresponds to repo scanning time) Screenshot from 2023-01-31 21-39-58

Although I'm not exactly sure why "Activity" graph at the top shows so few CPU feed.