useblocks / sphinx-needs

Adds needs/requirements to sphinx
https://sphinx-needs.readthedocs.io/en/latest/index.html
MIT License
213 stars 66 forks source link

Performance improvement: Optimize queries for needflow, filters etc. #1219

Closed arwedus closed 1 month ago

arwedus commented 2 months ago

In our project with ~3800 pages, ~20.000 internal needs and ~20.000 imported needs, ~10.000 external needs, and ~700 needflow diagrams, we saw a dramatic increase in sphinx build html time (with external needs resolved), from former 20-30 minutes to anything between 2-5 hours,when adding another 10000 imported needs. A performance analysis showed that build progress stalls in processing of the needflow directive, even if the plantuml binary is configured to just echo "plantuml disabled".

Also, filter queries for needextract are very slow (taking seconds, each) in this large project, with most of them being like:

:filter_func: id = ["REQ_SW_12345"]

My assumption is that the internal needs data structure is a list, and thus has O(N) cyclomatic complexity for ID search operations. Would it be possible to change sphinx-needs to use a dict instead, allowing O(1) access to needs by their ID? See https://wiki.python.org/moin/TimeComplexity

chrisjsewell commented 2 months ago

Would it be possible to change sphinx-needs to use a dict instead

Heya @arwedus it is already a dictionary, the filter_func you show though is obviously a very "special case", i.e. selecting a single need via the ID field, which would be the only time when one could think to just use the dictionary lookup.

It would make more sense, just to have an additional option on needextract, something like select, which allows for a list of IDs to initially select (maybe before applying filter_func if also present)

chrisjsewell commented 1 month ago

Closed in https://github.com/useblocks/sphinx-needs/pull/1281