Open chrisjsewell opened 10 months ago
Thoughts @danwos?
I use always the templates. So i even check the templates for debugging or if the templates differ by a supplied needs.json.
I belive it is easier for a user to reduce the data fieldes of a json file in a post processing (reduce step in database Methodology) than we offer many configuration parameter to configure the output.
So i even check the templates for debugging
Heya, I can understand you may want to use this for debugging, but then I feel this can be provided by a config flag (to include such "debug" data)
The problem is, particularly for commercial use cases, there can be many 1000s of needs, and (currently) all this data is loaded in memory, making sphinx-needs have a very high memory overhead. This is magnified if you try to run sphinx in multi-processing mode, since this data is replicated across all processes.
I belive it is easier for a user to reduce the data fieldes of a json file in a post processing
Also to note, there is already a lot of data fields removed from the internal representation before it is output to the needs.json.
I know this use case, iām working in really big projects. As the templates are normal options, it is necassary to get them out of sohinx-needs. So if a user wants to remove options from needs.json, i would suggest to use a post processing script for this.
So if a user wants to remove options from needs.json
Just to note, I'm not a user, I'm a new developer at useblocks (although with a lot of experience in the space š). So expect to see me doing a lot more development and maintenance of the code base.
This is definitely helpful feedback thanks; that the template field is being used, and I would be happy to meet, to understand further how you use sphinx-needs.
It's not so easy to remove fields from needs-data, as you never know the use cases and if the data may be used for filtering needs.
But I also have an eye on memory consumption, so totally understand the motivation.
Maybe it would be good to know the benefit of such a removal in terms of memory consumption.
So we should get some states about how big our needs_all_needs
dict is, and maybe make a detailed analysis of a single need. Sounds like a task for Sphinx-Performance :)
Another thing I've just realised: links_back
is ommited from the needs.json
: https://github.com/useblocks/sphinx-needs/blob/202135d78f91f75aa03228323ccb03347639f075/sphinxcontrib/needs/needsfile.py#L46
However, for any user defined links (from needs_extra_links
), there back links are included in needs.json
.
This is definitely a discontinuity, and so I would suggest these should be omitted also
(this also relates to #997)
@danwos : I upvote the suggestion in comment https://github.com/useblocks/sphinx-needs/issues/996#issuecomment-1695719594, as every little bit that gets us a smaller needs.json is good.
@arwedus actually this was already done in #1053 š
There are fields that are are stored, for each need, in the
env.needs_all_needs
that are maybe not really necessary for further processing. These add to the size of theneeds.json
.For example,
jinja_content
,template
,pre_template
andpost_template
are used to generatecontent
,pre_content
, andpost_content
in https://github.com/useblocks/sphinx-needs/blob/a16fb72d44b2c9f65c4065e386e2ad37d0a25870/sphinx_needs/api/need.py#L410content
,pre_content
, andpost_content
are then used to generate AST.After this none of these fields are required, so I feel they should not be stored on env.needs_all_needs`