quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.93k stars 325 forks source link

sitemap.xml generation incorrect date? #3251

Open thomashallam opened 2 years ago

thomashallam commented 2 years ago

Bug description

When I use the command 'quarto publish' every post/page is recreated/updated in html, regardless of whether the page/content changed or not.

image

using the above method, dates on the sitemap are not reflecting when the content was most changed

another scenerio is you might change or remove a category on one page, quarto publish would regenerate every since post page and the sitemap. the underlying content technically didn't change on the pages, only meta-data displayed in one page

it seems that sitemap updated dates should relate to content, not navigation/meta-data ?

Thanks

Tom

vscode+extension windows 11 github/actions

Checklist

dragonstyle commented 2 years ago

Note - it looks like we are using the last modified time of the output file when creating the sitemap. It probably makes more sense to use the last modified date from the input file since the output file may be recreated by a render anytime the site is published.

dragonstyle commented 1 year ago

This had pretty negative performance implications so I'm having to revert it. I will try to improve the performance and fix it once again.

(Reverted here 063c6a1422883a229f091bc949a9690c7b2318d1)

HenrikBengtsson commented 7 months ago

I'm coming here for similar reasons. I use https://github.com/quarto-dev/quarto-actions to re-render a Quarto website once an hour. When there's no change in content, the sitemap.xml is still updated, which triggers "noisy" hourly git commits, and possible other downstream artifacts, e.g. RSS feeds. It also triggers the GitHub Pages to rebuild and republish the site. So, noise and unnecessary processing follows from these timestamps.

This had pretty negative performance implications so I'm having to revert it. I will try to improve the performance and fix it once again.

Maybe both approaches could be supported via a setting in _quarto.yml, e.g.

format:
  html:
    sitemap-timestamps: input

while the default would be (as now):

format:
  html:
    sitemap-timestamps: output

? Next level up would be to make it a per-page setting in the yaml front matter.

mcanouil commented 7 months ago

@cscheid Should this be reopened?

cscheid commented 7 months ago

Yes, I closed it incorrectly, one way or another.