Add static configuration (``Sphinx.toml``)

choldgraf commented 3 years ago

Background

One of the challenges in getting started with Sphinx is the conf.py file, for a few reasons:

It is written in Python, and so it is Python-specific, even if the person writing the documentation is using a different language.
It is a fully-flexible Python script, which can be overwhelming for users not accustomed to it.

Over the years, many other configuration formats have arisen, probably the two most well-known are YAML and TOML. For example. Jupyter Book provides a layer of YAML configuration on top of Sphinx. Users have responded that this is a really friendly pattern for beginners and experts alike. I wonder if Sphinx would be interested in allowing for YAML or TOML configuration as well.

Describe the solution you'd like

In addition to the current config option of conf.py, add another option:

Allow config with YAML. I think it would be useful if Sphinx allowed for:

conf.yml. This would be read-in with PyYAML.

This file would be read in and converted to Python variables directly, as if it was written in Python (conf.py). So for example:

# In conf.yml
key: value
mylist:
  - item1
  - item2
mydict:
  dk1: one
  dk2: two

would map onto

# In conf.py
key = "value"
mylist = ["item1", "item2"]
mydict = {"dk1": "one", "dk2": "two"}

Allow conf.py to be provided simultaneously. Some Sphinx builds will still need to run custom Python code (e.g., to set up some extensions etc). In this case, authors may wish to keep their "simple config" in the YAML file, and the complex config in pure Python.

If conf.py is supplied as well as conf.yml, then the environment defined in conf.py will over-rule anything in conf.yml.

So the order of operations would be:

(if it exists) Read in variables from conf.yaml
Update with variables from conf.py if it exists, overwriting variables created in 1
Everything else is the same...

Describe alternatives you've considered

I've tried creating a lightweight extension that allows this but didn't have success because of the way that extensions are activated.

I have also considered other documentation engines like mkdocs, which use YAML, but I'd for this to be in the Sphinx ecosystem!

cc some others who have discussed this in the executablebooks/ repo: @pradyunsg @ericholscher @chrisjsewell

EDIT: I've updated the above description to remove mention of TOML, as I don't want that to derail conversation here!

ericholscher commented 3 years ago

Just wanted to chime in here and say this would be a great addition. I think it would improve the onboarding experience, and allow simple configurations to be machine-parsable. The dynamic nature of the Python configs definitely leads to a lot of customizations that are harder to support in varied development environments, which is a very common mistake for first-time Read the Docs users.

I'm in favor on adding it, and I also wanted to note that between the Executable Books & RTD teams, we'd probably be willing to implement and document this work, so we're mostly looking for a 👍 or 👎 from the team before starting a PR.

pradyunsg commented 3 years ago

A couple of thoughts from me:

This, with the cascading described, would be amazing!
Let’s only have a single file format though, and not allow for one-of YAML and TOML.

More unsolicited thoughts

IMO the choice for the file format comes down to “how much do you like nesting”. If you wanna have JSON-like arbitrary nesting, then YAML is likely a better fit. I’m likely biased, but I do think conf.py’s generally flat structure translates very nicely toward TOML’s design. Most existing conf.py files are probably also almost-valid TOML files already! Neither choice is wrong, both have gotchas, and I’d like it if we went with TOML here (it also helps the case for if/when I push to get a parser for that into the standard library).

chrisjsewell commented 3 years ago

Indeed the main thing is to get a 👍 from the sphinx team, and I am definitely +1 😄

In terms of YAML vs TOML; I would note that both jupyter-book (_config.yml and _toc.yml) and RTD (.readthedocs.yml) currently use YAML for their configuration files, and so at least for those use case, I feel TOML would be an additional overhead in understanding for the user

jakobandersen commented 3 years ago

I don't have a fully formed opinion yet, but some thoughts:

conf.py can not go away, both for backwards compatibility, but also as the dynamic nature can be quite useful sometimes, e.g., loading the version from somewhere else, auto-generating content, etc. So as noted in the OP this would be another layer of configuration loaded before conf.py.
Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects. Theoretically, one could put such conventions on ones own conf.py and read static config data until a custom line comment, but this is indeed icky.
I think it is too much to add the special key for Python code. Putting static data into such a config file is fine, and achieve easy machine-readability of that part (by convention). If you need arbitrary Python code, I would say to just stick it in conf.py.
The config file should not be misleading with respect to how the configuration really works. I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

hukkin commented 3 years ago

Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects.

Maybe a terrible idea, but what if conf.py location was made configurable and nullable in conf.toml/yaml (perhaps still defaulting to conf.py?). Then in the conf.py == null case machine-readability would actually be a thing.

Maybe the null case could even be the default, given that the toml/yaml is a new feature, so it shouldn't break existing projects.

chrisjsewell commented 3 years ago

I guess the classic example of a co-existance of such files is the setuptools setup.py and setup.cfg. I would certainly check their implementation

pradyunsg commented 3 years ago

What setuptools does is basically pretend there's a minimal setup.py file, if it doesn't exist. Notably, it's possible for tooling to detect whether setup.py exists and if it doesn't, it means that everything is declared statically.

For Sphinx, the minimal conf.py file would be empty, I guess?

hukkin commented 3 years ago

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally

TOML allows top level keys though with no section defined. The following is valid toml:

project = "sphinx"
version = "0.0.1"

Sections are only required if there's dictionaries in conf.py in which case they feel very natural to me.

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like

- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

pradyunsg commented 3 years ago

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

Well, TOML is literally designed to be a configuration file format. https://toml.io uses the tag line: "A config file format for humans". Think of it as unambiguous INI files. The clarity and unambitious nature of the format is why pyproject.toml is TOML based, same for Cargo.toml and more. :)

To address the specific question raised: All key-value pairs in a table [section] end up in a dictionary named section. In other words, it's how you do nesting.

Anyway, the reason I kept my thoughts on file format choices in hidden-unless-you're-curious is because I didn't want push this issue toward that way. I should've just omitted that whole thing.

Let's first wait for opinions on the general idea of static metadata in Sphinx, before discussing the exact file format further. :)

chrisjsewell commented 3 years ago

Yeh, at the end of the day, JSON/YAML/TOML all basically map to each other 1-to-1, so it won't really affect the underlying code/logic to be written

tk0miya commented 3 years ago

+0 for supporting static config file. I don't think python script is not good for the config file. But it's reasonable to support a commonly used file format for sphinx. -1 for supporting the combination of .yaml and .py. It's too complicated and I don't understand the worth of it.

And I don't have opinion for YAML vs TOML. Because I've never written a .toml file.

choldgraf commented 3 years ago

@tk0miya thanks for your thoughts! Could you clarify why you don't want a combination of YAML and Python? I think the combination of YAML + Python fits the use-case that somebody wants 99% of their configuration in a well-structured config file, but also needs to run some custom Python code if a particular extension needs it or something. I think this is actually a pretty common use-case.

I think we should just scope this conversation to YAML since it is super common for config, and readthedocs uses it, and leave TOML to a later conversation

chrisjsewell commented 3 years ago

sphinx. -1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

jakobandersen commented 3 years ago

sphinx. -1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

What would be the benefit of doing that anyway? I can to some degree see the point that each author may want to shift as much as possible to something more easily parsable, but maybe I don't fully get the problem with the current setup:

It's in Python, maybe the user doesn't know Python: well, the same argument can be made about YAML or whatever other format.
It can contain arbitrary code: sure, but as a new user you don't need to put arbitrary code there, and if you read another project, then that arbitrary code would just move somewhere else where you would also need to understand it.

Can you elaborate on the reasoning behind this?

choldgraf commented 3 years ago

Here's a few thoughts:

benefits of YAML

Structured and easier to parse (so you can machine-read/write it much easier)
Language agnostic (so you don't give one language special status for most use-cases)
Extremely common (mkdocs, Hugo, readthedocs + almost any other SSG configure things with YAML. Many people already have a mental model of configuration with YAML). You are correct that perhaps a new user will need to "learn YAML", but because YAML is not a computer language, it is already very commonly used across many other computer languages.

downsides of YAML

A decent number of "gotchas" (e.g. true, false, etc)
Inflexible (because it is just a data structure, it has no notion of execution etc)

benefits of Python

Flexible and extensible
Well-known language

downsides of Python

Not structured, so hard to parse
Less-commonly used as a configuration step in similar tools (though nikola does use conf.py as well)
Complex, and can be intimidating to new users who must now learn a computer language
Implies that Sphinx is "just for Python documentation", which I don't think is true

So to me this sounds like a reasonable base for : Support YAML for simple configuration use-cases, which are probably most use-cases. For anything advanced, let people provide a conf.py for more complex configuration. YAML maps pretty cleanly onto variable creation in Python and there are a ton of YAML readers out there, so this would be both low-maintenance, and a good entry-point into the Sphinx ecosystem for people who are used to configuring things with YAML. It would also make it easier for services to build on top of Sphinx - for example, Jupyter Book or ReadTheDocs.

As an aside, one of the most common things people like about Jupyter Book (which is built on Sphinx), is that it supports YAML configuration. One reason I opened this issue is because so many people have told me they prefer YAML, that I think it is worth considering for core Sphinx, as I think it would be a benefit to many.

jakobandersen commented 3 years ago

@choldgraf, right, basically I agree with all those points. Where the disagreement/confusion comes from is how this will work, and the comment from @chrisjsewell:

It also feels like it would be very difficult to migrate everyone from py to yaml?

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist? One of the things I find really appealing with Sphinx, and Python in general, is the hackability. Sphinx is already quite extensible via documented means, but otherwise Python allows for easily run-time haxing of whatever is needed until a proper solution can be found. Therefore I suggest conf.py and conf.yaml must co-exist, in the sense that variables in conf.py overrides those in conf.yaml. This makes the implementation backwards compatible and still allows arbitrary code for initialisation.

chrisjsewell commented 3 years ago

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist?

Oh no, I'm arguing for exactly the opposite lol

choldgraf commented 3 years ago

@jakobandersen ah in that case I totally agree with you, I think @chrisjsewell was suggesting they need to co-exist as well. I'll try to clarify this in the title + top-comment as well

jakobandersen commented 3 years ago

Ah, all good then :-) As an add-on suggestion: sphinx-quickstart should be updated to generate both files, static data and associated comments in the YAML file, and then additional comments explaining the relationship between the files and the rationale for having them (i.e., the essence of this thread). It could even be updated such that in the final script output where it explains how to proceed, then also write about how to configure with the YAML and Python files.

shimizukawa commented 3 years ago

+1 for supporting static config file. I had thought about introducing conf.ini too, before YAML became as popular as it is now. This is because I felt that writing configuration in Python script is a subtle stumbling block for beginners.
-1 for supporting the combination of .yaml and .py. About the hackability of conf.py, I think it would be a good idea to be able to write a new extension mechanism for configuration, because I feel that allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

pradyunsg commented 3 years ago

allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

Hmm... I would've imagined setting a value in conf.yml and conf.py would result in an error OR cause the Python value to be used.

pradyunsg commented 3 years ago

Awesome! So, everyone is on board for (or ambivalent to) allowing static metadata! 🎉

I think there's a few things to decide on AFAICT:

file semantics
file name
file format

Remembering the law of triviality, I'm gonna focus on semantics first. :)

Semantics

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time, because it would get confusing when folks define keys in both. I agree! Specifying values in both is weird and confusing.

I disagree that we shouldn't allow the files to complement each other when they both exist, without overlaps though. Allowing both to co-exist, and erroring out if the same value is defined in both (which isn't that much code complexity), allows for a significantly better experience with the static metadata:

I write a nice Sphinx site, with only static metadata. After some time, I realize I do need some amount of dynamic behaviour (idk, need to add to sys.path for autodoc to work). If we don't allow both files to co-exist, this means that now I'll have to translate the YAML configuration into Python values, and start all over again. Compare that to just being able to add that sys.path.append in a newly created conf.py and moving on. After the first experience, I don't think I'd bother with the YAML files again. The second one is much nicer!

File name

In conf.yml

Let's use sphinx instead of conf in the filename?

That way, it's much clearer what tool is being used. Searching for the filename on search engines will actually yield useful results; which likely won't happen for conf.yml.

File format

Full disclosure: I am the primary maintainer of TOML now. And, unsurpisingly, I'd like to advocate for adopting TOML over YAML here.

Excuse me for being lazy, and quoting some pieces of writing:

In the toml-lang/toml README -- this is the only quote where I've contributed wording.

TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. TOML differs in combining these, allowing comments (unlike JSON) but preserving simplicity (unlike YAML).

Because TOML is explicitly intended as a configuration file format, parsing it is easy, but it is not intended for serializing arbitrary data structures. TOML always has a hash table at the top level of the file, which can easily have data nested inside its keys, but it doesn't permit top-level arrays or floats, so it cannot directly serialize some data. There is also no standard identifying the start or end of a TOML file, which can complicate sending it through a stream. These details must be negotiated on the application layer.

INI files are frequently compared to TOML for their similarities in syntax and use as configuration files. However, there is no standardized format for INI and they do not gracefully handle more than one or two levels of nesting.

Comparison of configuration file languages, done during the PEP 518 discussion

Personally, I would sum up the above as:

|                             | YAML | JSON | CP  | TOML |
|-----------------------------+------+------+-----+------|
| Well-defined                | yes  | yes  |     | yes  |
| Real data types             | yes  | yes  |     | yes  |
| Sensible commenting support | yes  |      |     | yes  |
| Consistent unicode support  | yes  | yes  |     | yes  |
| Makes humans happy          |      |      | yes | yes  |

[snip] Given all of the above, I tend to think the trade-offs fall in favor of TOML.

PEP 518's discussion of "why not YAML"

One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.

Two is that YAML itself is not safe by default. The specification allows for the arbitrary execution of code which is best avoided when dealing with configuration data. It is of course possible to avoid this behavior -- for example, PyYAML provides a safe_load operation -- but if any tool carelessly uses load instead then they open themselves up to arbitrary code execution. While this PEP is focused on the building of projects which inherently involves code execution, other configuration data such as project name and version number may end up in the same file someday where arbitrary code execution is not desired.

Example demonstrating how YAML can be ambigous in weird ways, from earlier in this thread

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like
- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

One fun example of this is the Norway-YAML law.

Finally, thanks to pyproject.toml, a lot of Python tooling is going to be configured through TOML going forward. It'd be nice for Sphinx to hop on board as well! :)

chrisjsewell commented 3 years ago

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time

Maybe I misunderstood, but I got the impression that @tk0miya wanted to completely remove the python file, rather than just restrict its use?

chrisjsewell commented 3 years ago

realize I do need some amount of dynamic behaviour

One dynamic thing I actually do a lot in projects is add a builder-inited event, to run sphinx-apidoc and automate the build of the api documentation pages (which I gitignore from the repo). But maybe I am missing a better way to do this?

pradyunsg commented 3 years ago

But maybe I am missing a better way to do this?

Well, if you're missing something, then it's you and I both. :)

One of the nice things about conf.py is that it also basically serves as an extension, once you add the setup function.

bjones1 commented 3 years ago

If Sphinx does allow for configuration from static metadata, I would suggest using Python literals as the file format; see an example (sphinx_static_config.zip) of conf.py below. Since this format is a subset of the Python language, everyone familiar with conf.py will already know to encode configuration data, rather than learning to express values in YAML/TOML/JSON/etc.

import ast

def setup(app):
    with open("sphinx-conf.pylit", encoding="utf-8") as f:
        cfg = ast.literal_eval("{\n" + f.read() + "\n}")
    for key, value in cfg.items():
        app.config[key] = value

hukkin commented 3 years ago

I would suggest using Python literals as the file format

There's probably two groups of people:

a) non-Python people using MkDocs (or similar) instead of Sphinx, because it uses a widely used static configuration format b) python people unfamiliar with YAML/TOML/JSON

I'd imagine a) is the group we're targeting here and also probably the larger group. Also, group b) already has the conf.py... So I'd stick to either TOML or YAML.

Also YAML, TOML, JSON etc. already have existing tools (parsers, formatters etc) in a variety of programming languages, something that Python literals don't.

astrojuanlu commented 3 years ago

Notice that "The recommended extension for files containing YAML documents is .yaml ( http://yaml.org/faq.html ) and this has been the case since at least Sep 2006 ( https://web.archive.org/web/20060924190202/http://yaml.org/faq.html )." (instead of .yml) (copied from https://github.com/readthedocs/readthedocs.org/issues/7460#issue-694055600)

astrojuanlu commented 3 years ago

(In anycase, I'd support TOML over YAML as well - but I have no horse on this race :) )

tk0miya commented 3 years ago

I think the combination of YAML + Python fits the use-case that somebody wants 99% of their configuration in a well-structured config file, but also needs to run some custom Python code if a particular extension needs it or something.

Please let me know an example. I think "custom Python code" is not a configuration. So it would be better to use "extension" instead. I can understand you'd like to use the combination of config.yaml and ext.py. But I can't imagine the case both config.yaml and config.py are needed.

Compare that to just being able to add that sys.path.append in a newly created conf.py and moving on. After the first experience, I don't think I'd bother with the YAML files again. The second one is much nicer!

I think it's better to add a new configuration to append sys.path to the YAML file.

File name Let's use sphinx instead of conf in the filename?

+1

File format

IMO, YAML is widely used than TOML. The one of the goals of this issue is supporting commonly used file format as configuration of Sphinx. So I'd like to vote to YAML. But I also think pyproject.toml is the future of python. So it would be fine if we support both sphinx.yaml and pyproject.toml.

Maybe I misunderstood, but I got the impression that @tk0miya wanted to completely remove the python file, rather than just restrict its use?

No. I don't think dropping conf.py support. We have tons of conf.py in the world. It's terrible. What I objected is using YAML and conf.py at the same time.

astrojuanlu commented 3 years ago

No. I don't think dropping conf.py support. We have tons of conf.py in the world. It's terrible. What I objected is using YAML and conf.py at the same time.

If we agreed on adding support to sphinx.yaml, how could that be done on a backwards-compatible way without, at least temporarily, support both formats at the same time?

tk0miya commented 3 years ago

I'd like to say support a) project having only conf.py and b) project having only sphinx.yaml, but do not support c) project having both conf.py and sphinx.yaml. I think it's not breaking change.

jakobandersen commented 3 years ago

Please let me know an example. I think "custom Python code" is not a configuration. So it would be better to use "extension" instead. I can understand you'd like to use the combination of config.yaml and ext.py. But I can't imagine the case both config.yaml and config.py are needed.

For direct configuration: the version variable is very nice to set in a dynamical manner, e.g., by reading it from a file. The version needs to go into both the documentation, build system, and source, so manually updating it is error prone. Of course, one could autogenerate a sphinx.yaml file, but it seems icky having to do that in every/many projects. The rest of the dynamic stuff I do in conf.py could as well be handled by an external script for auto-generation. As an example of a large project with a lot of dynamic behaviour in conf.py there is the Linux kernel: https://github.com/torvalds/linux/blob/master/Documentation/conf.py

I still think it would be best to allow a sphinx.yaml/conf.yaml and conf.py at the same time, loaded in that order, to support a hybrid. This also enables an iterative migration for those projects that wish that. There is already a precedent for hybrid configuration in the sense that sphinx-build supports -D for overriding conf.py, so I imagine that the underlying mechanism is already there.

humitos commented 3 years ago

c) project having both conf.py and sphinx.yaml

This case to me should be sphinx.yaml + small custom extension.py installed where the user can put all the dynamic code.

I'm :+1: on not supporting both config files at the same time if that's possible (eg. this case is covered with a small extension as mentioned)

chrisjsewell commented 3 years ago

Yep +1 on sphinx.yaml + extension.py (that can contain a local extension), and it would also be nice to have a "canonical" way to specify at least the version dynamically

ericholscher commented 3 years ago

Yea, strong 👍 on having yml + well-defined ways of doing "magical" things. So something like:

version_object: package.__version__ which is imported during runtime
path_additions: ['..', 'package_name']

So standardizing and formalizing the primary existing value from the dynamic config in a more user-friendly way.

humitos commented 3 years ago

I don't think we need add magic for this. We can use YAML tags and Python types like !!python/ for these. Take a look at this example:

# sphinx.yaml
version: !!python/name:mypackage.__version__
path_additions: !!python/object/apply:myextension.get_paths []

# mypackage.py
# This is my library/module/package source code
__version__ = 'This is the version value'

# myextension.py
# This is my Sphinx extension
def get_paths():
    return [
        '/tmp',
        '/home/humitos',
    ]

▶ python -c "import yaml; print(yaml.load(open('sphinx.yaml'), Loader=yaml.UnsafeLoader))"
{'version': 'This is the version value', 'path_additions': ['/tmp', '/home/humitos']}

There is a lot we can do dynamically with YAML. As you may noted already I used yaml.UnsafeLoader because executing Python code from inside the YAML is not considered safe due arbitrary code execution, but for our use case I think it's fine --we are already executing a whole conf.py anyways. However, depending on the situation we could safe load it and only get the static values if needed:

>>> import yaml
>>> class SafeLoader(yaml.SafeLoader):
...   def ignore_unknown(self, node):
...     return None
...
>>> SafeLoader.add_constructor(None, SafeLoader.ignore_unknown)
>>> yaml.load(open('sphinx.yaml'), Loader=SafeLoader)
{'version': None, 'path_additions': None}

I followed the example that @ericholscher used with path_additions. However, I understand that implies Sphinx to know about that setting and then call sys.path.append which each of those items. Instead doing that, I think we can promote/document to do this directly in the myextension.py file --as users are already doing with the conf.py file.

# sphinx.yaml
version: !!python/name:mypackage.__version__
path_additions: !!python/object/apply:myextension.add_paths ['/tmp', '/home/humitos']

# myextension.py                                                              
import sys

def add_paths(*paths):
    for path in paths:
        sys.path.append(path)

    return paths

▶ python -c "import sys, yaml; print(yaml.load(open('sphinx.yaml'), Loader=yaml.UnsafeLoader)); print(sys.path)"
{'version': 'This is the version value', 'path_additions': ('/tmp', '/home/humitos')}
['', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '/home/humitos/.local/lib/python3.9/site-packages', '/usr/lib/python3.9/site-packages', '/tmp', '/home/humitos']

Summarizing, I'm :+1: on adding YAML and define all the static and dynamic metadata on a sphinx.yaml file, putting the code for the dynamic data into a separate Python file for more complex work avoiding Sphinx the needing of extra knowledge for customizing non-common behavior.

Edit: note that this technique is used by PyMdown Extensions and it's supported on MkDocs. See https://facelessuser.github.io/pymdown-extensions/faq/#function-references-in-yaml

ericholscher commented 3 years ago

I'd really like to keep Python out of the YAML -- I'd argue that's going backwards, and makes the YAML files hard to parse again. If you want additional customization like this, I think it's probably best to point people to using conf.py for this -- arguably a cleaner solution that MkDocs doesn't have as an option. They were forced to stuff Python into their config, we already have a better option for this use case.

I'd argue for keeping things as simple as possible to start, instead of adding features we regret later. If we come to see some obvious need for python-level data types in our YAML, we can certainly add them. However, I don't see why it's required, and I can see lots of potential security and user experience downsides to such a syntax.

pradyunsg commented 3 years ago

FWIW, if we're decided on YAML here (I'm making a grumpy face here, because I feel like that's the inferior choice here), I think it's definitely a good idea to:

avoid arbitrary code execution, which ties the file to the Python implementation. (eg: tags)
disallow features that, when given crafted inputs, can be DOS-like attacks, consuming lots of memory/CPU (eg: anchors).

ericholscher commented 3 years ago

@pradyunsg Strong 👍 . I read more about the anchor stuff, and definitely sounds bad.

Edit: removed more TOML discussion. Don't want to get into it right now :D

pradyunsg commented 3 years ago

So it would be fine if we support both sphinx.yaml and pyproject.toml

I'm not sure what you're thinking Sphinx would use pyproject.toml for. I can't think of any good reason to use that, over having a sphinx.toml file in the docs directory instead.

I don't think we should have Sphinx documentation configuration in pyproject.toml. There's only one pyproject.toml in a project, but a project can have multiple sphinx documentation sets if they want.

IMO, YAML is widely used than TOML.

It is indeed, since it's been around longer and is intended as a general data representation format.

TOML is not as general as YAML, and it is explicitly designed for configuration files. There are usecases which are explicitly not well suited to TOML, because it doesn't try to cover everything. It's meant for one thing: being a good configuration file format.

The one of the goals of this issue is supporting commonly used file format as configuration of Sphinx.

Indeed. In my opinion, the ideal format to use would be something that is:

designed explicitly for configuration files.
baked into the Python ecosystem through PEPs.
being adopted by modern Python development tooling.

That format is TOML. :)

choldgraf commented 3 years ago

A few thoughts from me:

I like @chrisjsewell's suggestion here: https://github.com/sphinx-doc/sphinx/issues/9040#issuecomment-815059146
From Jupyter Book's perspective, I'd strongly prefer YAML because we already use it, and because so many other projects use it. I don't think Jupyter Book will switch to using TOML any time soon (unless it was to allow it in addition to YAML, but I am worried that you can write valid YAML that also seems like sensible structure that is not valid TOML, see below).
I also tried converting a Jupyter Book configuration from YAML into TOML, and found that it is not possible because TOML cannot have a dictionary inside a list. (at least, in the Python toml package, this fails for me: dumps(["one", {"two": "three"}]))
Whether it's YAML or TOML, I agree there should be minimal "magic" in there. If people want magic, ask them to write Python.

chrisjsewell commented 3 years ago

Yeh to re-emphasise what others have said, in response to comments like these:

We can use YAML tags and Python types baked into the Python ecosystem through PEPs. being adopted by modern Python development tooling.

A key point to me, is that the basic configuration is essentially in no way associated with Python. For "beginner" users, of the type we see e.g. with jupyter-book, a lot don't (and shouldn't have to) care that Sphinx is written in Python, that's just an implementation detail. They just want the simplest way to turn their documentation into a web-site or PDF

chrisjsewell commented 3 years ago

Ok lets get this back on track again.

So firstly, if sphinx can be configured by either conf.py or sphinx.yaml/sphinx.toml (+ optional extension.py), how are we detecting what to use?

Sphinx searches for these files automatically and errors if multiple are detected

or

You set specifically via sphinx-build what to look for, and possibly sphinx infers by the extension what format it is, e.g.: sphinx-build --conf sphinx.yaml

tk0miya commented 3 years ago

For direct configuration: the version variable is very nice to set in a dynamical manner, e.g., by reading it from a file. The version needs to go into both the documentation, build system, and source, so manually updating it is error prone.

About version's case, what users want to do is read version info from code (or from a file). Reading it via python code is only a method, not a goal. So it can be resolved if we provide a way to do that. I feel it would be better to provide better ways for well-known techniques in the conf.py instead of hacks. The version_object and path_additions are good examples of them.

tk0miya commented 3 years ago

I'm not sure what you're thinking Sphinx would use pyproject.toml for. I can't think of any good reason to use that, over having a sphinx.toml file in the docs directory instead.

We can set up other tools via it; pylint, black, and so on. So it's good to store the configurations for Sphinx to pyproject.toml also. I guess the configuration file for almost of users contains only a few configurations. So it's not too bad. But it's my just thought. Please forget it if you feel it's not worthy.

I don't think we should have Sphinx documentation configuration in pyproject.toml. There's only one pyproject.toml in a project, but a project can have multiple sphinx documentation sets if they want.

Indeed. But using pyproject.toml does not prevent to use of multiple configurations. If they want to do that, they can use multiple configuration files like sphinx.yaml (or sphinx.toml) instead. Using pyproject.toml is not required. Additionally, almost users uses only one configurations, I guess (not researched actually).

tk0miya commented 3 years ago

Sphinx searches for these files automatically and errors if multiple are detected

You set specifically via sphinx-build what to look for, and possible sphinx infers by the extension what format it is, e.g.: sphinx-build --conf sphinx.yaml

+1 for the ~~latter~~ former one because I can't imagine the Sphinx project that switches sphinx.yaml or conf.py alternatively. It's enough to specify confdir via -c option. It's simple and no new option is needed. (Additionally, it's easy to implement it!)

jakobandersen commented 3 years ago

Note that the version variable is probably the most common and simple example of need for dynamic configuration. I think it is better to provide flexibility such that common things are easy, but that there always is a not to difficult way to do uncommon things. The current conf.py is at the extreme end where you can do everything, but simple things are a bit more complicated than needed (need to know a bit of Python, and know that assigning variables 'magically' sets the config var). I agree that putting any kind of Python code in the YAML/TOML file seems to defeat the original purpose. Do I understand it correctly that the current conf.py is basically an extension + config var extraction? So the suggestion of sphinx.yaml + ext.py means that dynamic configuration should be done from within an extension. Is that easy do to? Don't you get into all kinds of extension ordering problems?

Just to reiterate my suggestion:

Load sphinx.yaml first, if it exists. A clean YAML file, with only static configuration.
Load conf.py next, if it exits. The same type of file is now. Configuration variables here overwrite those in sphinx.yaml (or there could be a "strict"-mode of sphinx-build where the two files must specify disjoint config variables).
Then apply command line overwrites (-D), as is done now.

With this:

There is a clear incremental migration path.
Simple projects would only need sphinx.yaml.
Less simple projects could do a mix of static and dynamic configuration, as they see fit.
Individual projects can ban conf.py if they want to.
I expect the implementation to be relatively simple: "just" start the config dictionary from the loaded sphinx.yaml.

tk0miya commented 3 years ago

Ah, I've forgotten to mention the code execution. Even if we choose YAML as a configuration format, I'd not like to support code execution. IMO, it's too magical and hacky if we support it.

I think new static configuration file should contain only static configuration. It must be separated to the extension.

chrisjsewell commented 3 years ago

Sphinx searches for these files automatically and errors if multiple are detected

You set specifically via sphinx-build what to look for, and possible sphinx infers by the extension what format it is, e.g.: sphinx-build --conf sphinx.yaml

+1 for the former one because I can't imagine the Sphinx project that switches sphinx.yaml or conf.py alternatively. It's enough to specify confdir via -c option. It's simple and no new option is needed. (Additionally, it's easy to implement it!)

Ok cool, so the second step is that, if sphinx.yaml then we also search for extension.py and, if found, load it as the first extension.

Both of them seem quite trivial to implement.

Next step, how do we handle dynamic values (such as version), two potential options:

Only allow these to be specified via extension.py

I guess you could do this via the (earliest) "config-inited" event function, but (a) that makes it quite complex to achieve, (b) it might already be too late in the process.

Allow some form of dynamism in sphinx.yaml

As mentioned previously, an example of this is with setuptools special directives (of note these are limited in their use to particular keywords). file: is quite general, whereas attr: is quite Python specific. Off the top of my head you could do something like regex:package/__init__.py:__version__ = "(.+)", to find a regex group in a file, which would be very generic (although still a little complex)

sphinx-doc / sphinx