useblocks / sphinx-needs

Adds needs/requirements to sphinx
https://sphinx-needs.readthedocs.io/en/latest/index.html
MIT License
193 stars 58 forks source link

Remove unnecessary fields from needs info #996

Open chrisjsewell opened 10 months ago

chrisjsewell commented 10 months ago

There are fields that are are stored, for each need, in the env.needs_all_needs that are maybe not really necessary for further processing. These add to the size of the needs.json.

For example, jinja_content, template, pre_template and post_template are used to generate content, pre_content, and post_content in https://github.com/useblocks/sphinx-needs/blob/a16fb72d44b2c9f65c4065e386e2ad37d0a25870/sphinx_needs/api/need.py#L410

content, pre_content, and post_content are then used to generate AST.

After this none of these fields are required, so I feel they should not be stored on env.needs_all_needs`

chrisjsewell commented 10 months ago

Thoughts @danwos?

PhilipPartsch commented 10 months ago

I use always the templates. So i even check the templates for debugging or if the templates differ by a supplied needs.json.

I belive it is easier for a user to reduce the data fieldes of a json file in a post processing (reduce step in database Methodology) than we offer many configuration parameter to configure the output.

chrisjsewell commented 10 months ago

So i even check the templates for debugging

Heya, I can understand you may want to use this for debugging, but then I feel this can be provided by a config flag (to include such "debug" data)

The problem is, particularly for commercial use cases, there can be many 1000s of needs, and (currently) all this data is loaded in memory, making sphinx-needs have a very high memory overhead. This is magnified if you try to run sphinx in multi-processing mode, since this data is replicated across all processes.

I belive it is easier for a user to reduce the data fieldes of a json file in a post processing

Also to note, there is already a lot of data fields removed from the internal representation before it is output to the needs.json.

PhilipPartsch commented 10 months ago

I know this use case, iā€˜m working in really big projects. As the templates are normal options, it is necassary to get them out of sohinx-needs. So if a user wants to remove options from needs.json, i would suggest to use a post processing script for this.

chrisjsewell commented 10 months ago

So if a user wants to remove options from needs.json

Just to note, I'm not a user, I'm a new developer at useblocks (although with a lot of experience in the space šŸ˜„). So expect to see me doing a lot more development and maintenance of the code base.

This is definitely helpful feedback thanks; that the template field is being used, and I would be happy to meet, to understand further how you use sphinx-needs.

danwos commented 10 months ago

It's not so easy to remove fields from needs-data, as you never know the use cases and if the data may be used for filtering needs.

But I also have an eye on memory consumption, so totally understand the motivation. Maybe it would be good to know the benefit of such a removal in terms of memory consumption. So we should get some states about how big our needs_all_needs dict is, and maybe make a detailed analysis of a single need. Sounds like a task for Sphinx-Performance :)

chrisjsewell commented 10 months ago

Another thing I've just realised: links_back is ommited from the needs.json: https://github.com/useblocks/sphinx-needs/blob/202135d78f91f75aa03228323ccb03347639f075/sphinxcontrib/needs/needsfile.py#L46

However, for any user defined links (from needs_extra_links), there back links are included in needs.json. This is definitely a discontinuity, and so I would suggest these should be omitted also (this also relates to #997)

arwedus commented 7 months ago

@danwos : I upvote the suggestion in comment https://github.com/useblocks/sphinx-needs/issues/996#issuecomment-1695719594, as every little bit that gets us a smaller needs.json is good.

chrisjsewell commented 7 months ago

@arwedus actually this was already done in #1053 šŸ˜„