Closed ichard26 closed 5 months ago
The quoting of default refers to the default.conf file have in the repo for people to use to get started.
I just hit this same issue — I wrote a config file from scratch based on the documentation, and was surprised to find that "the default value for this setting is [some value]" didn't mean that the setting had a default value. I never even saw default.conf
until I went to file an issue about the defaults not working as documented and found this issue.
PRs to link docs to default.conf and make configparser set default both welcome.
I've started looking at the documentation and code for mirror configuration. These are the options I've found so far:
Option Name | Type | Doc'd? | Default? (Doc/Actual) | Source File |
---|---|---|---|---|
json | bool |
yes | - / False |
configuration.py |
root_uri | str |
yes | "https://files.pythonhosted.org/" [^1] |
" |
diff-file | str |
yes | - / "" |
" |
diff-append-epoch | bool |
yes | - / False |
" |
storage-backend | str |
no[^2] | "filesystem" /"filesystem" |
" |
digest_name | str |
no | - / "sha256" |
" |
cleanup | bool |
no | - / False |
" |
release-files | bool |
yes | True / True |
" |
compare-method | str |
yes | "hash" / "hash" |
" |
download-mirror | str |
yes | - / "" |
" |
download-mirror-no-fallback | bool |
no | - / False |
" |
simple-format | str |
yes | "ALL" / "ALL" |
" |
master | str |
yes | "https://pypi/org" / !!! |
mirror.py |
timeout | int |
yes | 10 / !!! |
" |
global-timeout | int |
yes | 18000 / !!! |
" |
proxy | str |
yes | - / "" |
" |
stop-on-error | bool |
yes | - / False |
" |
workers | int |
yes | 3 / !!! |
" |
hash-index | bool |
yes | False / !!! |
" |
keep_index_versions | bool |
no | - / 0 |
" |
There are a handful that aren't documented (that is, at least not on the mirror configuration page), as well as the ones that are set in default.conf
but the don't have a default/fallback when a user's config file is loaded. I intend to update mirror_configuration.md
in the docs as well as look into adding defaults in code for the options that don't have one.
Presently there is a function in configuration.py
that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.
[^1]: root_uri
's default value interacts with release-files
. The description of the default in the docs matches the code, which is in pseudocode something like:
if not release-files and not root-uri then
"https://files.pythonhosted.org/"
else
""
[^2]: Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?
Thanks for digging in.
Presently there is a function in configuration.py that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.
Welcome to move all to configuration if you feel there is an advantage
root_uri's default value interacts with release-files. The description of the default in the docs matches the code, which is in pseudocode something like:
Is there a missing question here? Pseudo code seems right.
Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?
Sure - cross reference. But lets avoid as much duplication as possible
Hi @cooperlees , good to hear from you!
Presently there is a function in configuration.py that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.
Welcome to move all to configuration if you feel there is an advantage
I'll look in to this. I have one idea I'd like to hear your thoughts on. I was wondering if it would make sense to always load the module's default.conf
, whether or not there is a user config file, and have values from the user config overwrite/take precedence. That would make default.conf
a single source for what the default values are, but it could also mean making default.conf
more extensive and less suited for someone first getting started.
Otherwise I'm happy to just consolidate things into configuration.py
a bit, but seeing where default.conf
gets loaded when there's no file specified got me thinking about trying to expand its use.
root_uri's default value interacts with release-files. The description of the default in the docs matches the code, which is in pseudocode something like:
Is there a missing question here? Pseudo code seems right.
No missing question! I originally wrote the table as a list for my own reference, and when I posted it my footnotes came along too. Glad to hear my interpretation seems accurate. 🙂
Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?
Sure - cross reference. But lets avoid as much duplication as possible
Sounds good. 👍
Can't promise a timeline for a PR, but I'm happy I've found something I feel like I could contribute to, so I feel optimistic about finishing this.
I was wondering if it would make sense to always load the module's default.conf, whether or not there is a user config file, and have values from the user config overwrite/take precedence.
I like this idea a lot. Please lets unit test all scenarios tho please. Should be able to make configuration.py very clean as it seems you can just call read twice with configparser.
default.conf
in our packages too (I think we do)def merge_ini_files(file1: str, file2: str):
config = configparser.ConfigParser()
# Read the first INI file
config.read(file1)
# Read the second INI file
config.read(file2)
...
I've opened a draft pull request (#1669). I've made it a draft because:
Working on the mirror documentation definitely increased my understanding of Bandersnatch's behavior with different options, but of course let me know if I've introduced any inaccuracies.
I was wondering if it would make sense to always load the module's default.conf, whether or not there is a user config file, and have values from the user config overwrite/take precedence.
I like this idea a lot. Please lets unit test all scenarios tho please. Should be able to make configuration.py very clean as it seems you can just call read twice with configparser.
The only concern I have thus far is root_uri
- I think it needs to distinguish whether it is set in the user configuration file or not, so loading a default value for it can't be completely transparent. I'm going to play around with the idea anyway and see how far I get.
I do not know if it is preferable to review documentation updates separately from code changes, or to put them together in one PR.
Let's do the docs first but maybe
I maybe got a bit carried away rewriting/reorganizing and it may be too big of a change relative to the initial goal here.
All good to me - but let's just have it document reality and change the docs to state when we make the default.conf actually be the source of truth for all default options
I've also commented on the PR suggestions for some enhancements.
root_uri
- I think we leave it required in the config maybe as I think it's good to make people create their own config file ...
Thanks again.
All good to me - but let's just have it document reality and change the docs to state when we make the default.conf actually be the source of truth for all default options
This makes sense! That will leave the docs "correct" in the interim while I figure out the configuration implementation bits.
I will go ahead and change the docs PR so the options that are currently missing defaults are marked as required.
@cooperlees - I've been fiddling around with config validation for a bit and I'm struggling some with how diff-file
works.
diff-file = {{section_option}}/newstuff.txt
(unittest.conf, test_storage_plugins.py), but the implementation breaks if there is anything in the value other than the reference, e.g. diff-file = {{section_option}}
works but diff-file = {{section_option}}/mirrored-files
doesn't. It follows the 'Invalid section reference' path and defaults to '(directory)/mirrored-files'. (This makes it look like it is working if the value happens to be {{mirror_directory}}/mirrored-files
.)mirror.py:mirror
seems to expect diff_file
is optional and has "if diff_file" checks to not write the diff list if there's no path set. But validate_config_values
sets diff_file to the empty string, and in the mirror function has diff_file = storage_plugin.PATH_BACKEND(config_values.diff_file_path)
before any of the conditional checks, and Path("")
evaluates to the same as Path(".")
and is "truthy". So it seems like in practice a diff file is always written.For both of these it is straightforward enough to "fix" them, but doing so changes externally observable behavior in a potentially incompatible way:
ExtendedInterpolation
that satisfies the use case.diff-file
option set are getting one anyway, but the file can just be ignored. On the other hand changing it so diff-file is actually optional and isn't written if there's no path set would break things for anyone relying on the current behavior. I could also update the code to expect that there will always be a diff-file path, making the current behavior "official", so to speak.Thoughts on a direction to take?
(Also should I open a new issue for this config implementation stuff?)
Thoughts on a direction to take?
I prefer to just fix the magic file appearing and mark it loudly in change log. If anyone is relying on that, I feel it's better to break them and make them be explicit in their config file on their needs.
TIL that Path("") == Path(".") and thus truthy. "" is false, so yeah. Fun. I bet this was once a string before I pathlib
'ed everything.
Thanks for noting these found bugs and asking questions on fun cases like this.
Happy to! 👍
And yeah the Path(“”) thing… I didn’t know either, had to try it in a REPL a few different ways to convince myself. 😆
What's the issue:
Some of the configurable knobs don't actually have a fallback value defined even though the documentation explicitly states these values do have a default. Case in point, let's take a gander at the
[mirror].timeout
value -- it's documented as having a default of 10 seconds:buttt looking at the source code it actually doesn't have one: https://github.com/pypa/bandersnatch/blob/3b78e040a330e3308389e3d32d8c13e8b8c41af5/src/bandersnatch/mirror.py#L1012-L1014
Potential solutions:
fallback=$val
for all of the options that are documented having a defaultI personally prefer the first one since having defaults makes sense so fixing the code to actually implement them is a good idea ... although this is a good opportunity to do a configuration adiut and see if any options shouldn't exist / or have a default / should have their default changed.
Additional context:
I was notified of this issue from a quick exchange in the Python Discord server: https://discord.com/channels/267624335836053506/463035462760792066/877942046952947722