[Feature request] avoid regenerating pages that do not contain changes

rogervila commented 2 years ago

Hello,

I am generating using git to track changes on my documentation website generated with pydoctor.

When I regenerate the HTML output, all files are changed with the current timestamp. For example:

  <div class="container">
    <a href="index.html">API Documentation</a> for equipment,
  generated by <a href="https://github.com/twisted/pydoctor/">pydoctor</a>
    22.3.0 at 2022-03-16 12:45:05.
  </div>

I would like to have an option to avoid regenerating pages that do not contain changes to keep the originally generated timestamp.

This is the command I am using to generate the docs.

pydoctor --make-html --html-output=website/static/api my-package

Thank you.

adiroiban commented 2 years ago

Thanks for the report.

PR welcomed :)

rogervila commented 2 years ago

Good to know, I will work on it as soon as I have bandwidth!

tristanlatr commented 2 years ago

Hi @rogervila ,

Thanks for the feature request

Is it related to the speed of pydoctor ? I agree that pydoctor could be optimized at several levels.

Detect if the generated page will be same as the already present html without flattening the new object page as html seem like a very hard task. There are a lot of extrapolated Informations in the pages, like subclasses, overridden methods, etc. And, for now, a lot of this information is computed at the time of the html generation. So I’d say you’ll have to generate the html anyway. Then you’ll have to compare html files, most probably with regular expression. I think this could be done by generating documentation to a temp folder and synch each file that has changed/added/removed. But this approach will not improve pydoctor performance.

Tell me what you think

twisted / pydoctor

[Feature request] avoid regenerating pages that do not contain changes #521