Open jayaddison opened 3 weeks ago
Although I feel like I want to make adjustments here and may have ideas about how to do that, I'd appreciate input from other maintainers/developers because churn/incorrect fixes here could potentially be more disruptive than simply leaving the code-is.
AFAIK we don't currently have a way for downstream projects to entirely disable the dynamic copyright substitution logic when SOURCE_DATE_EPOCH
is enabled.
The matplotlib
project is not affected; they do have a multi-statement copyright notice, but all the statements are configured in a single str
(not a list
) copyright
config value, and, crucially, the first line of that text uses a Unicode en-dash instead of an ASCII dash (minus symbol), which Sphinx doesn't match on during substitution.
Also: maintainers: please feel free to convert this into a GitHub discussion if that's a more appropriate venue for this thread.
I'm not convinced that my assertion that only the most-recent/last copyright line should be updated is always true.
Although updating every line does seem incorrect in some situations, it seems unclear/intractable how to determine which lines to update. This could be a reason why some projects have chosen to avoid the substitution logic by using en-dash.
In terms of ideas for routes towards improvements, though -- and bearing in mind that, as far as possible, we should avoid breaking existing projects:
Perhaps we could provide a {{current_year}}
helper variable so that documentation projects that do want dynamic year values could opt-into a documented, somewhat-supported way to dynamically retrieve that build-time value (with support for SOURCE_DATE_EPOCH
).
Projects that do want dynamic copyright notices could then use that variable in either their single-line or multi-line copyright notices, as appropriate for their use case(s).
As mentioned in a previous thread I'm not too keen on dynamic copyright notices personally, but we do have practical examples of Sphinx projects that use them, and a way to retrieve the current year could help us support those cases. If I were to implement that, I'd document the substitution variable with a note to indicate that it is available for use, but not necessarily recommended.
I'd propose a combination of two changes:
{{ current_year }}
pattern within the copyright notice, so that sites can opt-in to current-year substitution with more control, compatible with both single-line and multi-line copyright notices.I'll attempt to draft a pull request for this soon (within the next week or two).
Exploratory thoughts: in some ways I think that using the last-modified-time (aka mtime
) of the most recent source document would be a better value to use during current-year copyright notice substitution. However, from practical experience, and for understandable convenience, many Sphinx projects build directly from git clone
(checkout of source control) of their documentation sources -- and that cloning process does not preserve the last-modification time of files (in other words: a file last edited in source control two years ago, when retrieved in a clone of the containing repository today, will have today's date in its timestamps (created and modified)).
(context: if we are to add current_year
replacement -- something I'm weighing up the pros and cons and moral qualms about - it would be good to make that value reasonably-accurate in standard/typical use cases, and relatively difficult to unwittingly build with a nonsensical value. a nice side-effect (although not the design goal) of using latest-modified-file time as the value would be that the emitted current-year would reflect the last time the source work was indicated as modified)
Although technically feasible, I don't think that it would make sense to attempt to add code to determine whether the source files are contained in a version control system (in order to detect modification times) -- doing that would mean that we end up maintaining code to handle an arbitrary and incomplete list of version control systems.
An incomplete, approximated code search for path:conf.py /^copyright.*today/
using GitHub code search today (20240705) returns over four thousand results. No doubt some of those are duplicates, but even so I think this indicates that there is a relatively frequent desire/use-case for current-year in Sphinx copyright notice config.
One reason that I checked that is to determine whether a harsher/breaking-change approach of disabling dynamic copyright notices would be acceptable. Initially I feel that it is not -- and also it would simply have the effect of pushing the problem elsewhere rather than solving it.
Given that context, the current use of SOURCE_DATE_EPOCH
-based replacement to achieve replacement selectively in the case of reproducible builds seems like a pragmatic approach.
Possible improvements remaining in my mind:
copyright
config, and only perform substitution in the dynamic case?There is also a risk of trying to be too clever and creating unnecessary complexity.
* Could/should we detect dynamic vs static `copyright` config, and only perform substitution in the dynamic case?
Based on my understanding of Python: detecting statically-declared strings may be possible --we could locate and parse the conf.py
and then apply some heuristic/AST checks -- but figuring out whether an f-string, for example, actually varies or not in practice is not feasible. So: it would be somewhat complex, and would have some false-positives (string considered dynamic although it may not vary). On the plus side, this would allow many simple, statically-configured projects to build reproducibly and correctly without modifications.
* In the case of multiline statements could/should we detect end-years _not_ to modify? (for example: perhaps we should only update the largest year values?)
In the general case this seems intractable, because there's very little that we can infer from copyright year ranges in multiline statements. Did contributions stop because of a change of ownership? Or simply because a contributor no longer provides edits to a project? We can't know those during isolated computerized builds from source.
However, it's possible that we could improve the substitution heuristics; for example: we could determine the system-clock year and only perform replacement of years in the notice that match that. The only edge case I can think of there is dynamic copyright
values that insert a different year -- but I can't think of a valid reason for a project to do that.
From my mental notes, three ideas remain valid, and here's how I rate them currently:
copyright
config. I think this would be challenging but I like the potential benefits of it. I will prototype this, although it may take me some time. (update: drated in #12519)current_year
substitution pattern. I am now less keen on this, because I think that it would encourage more sites to adopt dynamic notices and create more possibility of unwittingly-inaccurate output. I will de-prioritize this and do not intend to work on it unless the drawbacks of it can be resolved by a better design proposal.Edit: update checklist x2
From my mental notes, three ideas remain valid, and here's how I rate them currently:
An additional fourth idea that also remains valid:
Edit: update checklist
Describe the bug
Projects can configure multiline copyright notices (ref #4925), a useful feature for projects that have transitions between copyright holders over time.
However, the current config year-substitution logic enabled when
SOURCE_DATE_EPOCH
is configured will attempt to pattern-match and substitute the end-year from the build-year in all of the copyright lines.I'm not sure that's intended; I think that we should only apply the substitution for the most recent (which for an iterable I suppose means the final) line in the notice.
How to Reproduce
conf.py
index.rst
Expected
Actual
Environment Information
Sphinx extensions
Additional context
Discovered while investigating whether multiline copyright config could be a way to reduce risk of rewriting unintended copyright lines re: any interaction between matplotlib/matplotlib#28418 and #12450.