popsim-consortium / demes-python

Tools for describing and manipulating demographic models.
https://popsim-consortium.github.io/demes-docs/
ISC License
19 stars 6 forks source link

Implement (or document how to use) other input formats #575

Open molpopgen opened 2 months ago

molpopgen commented 2 months ago

The Graph.fromdict method lets us accept data from several formats in addition to the current YAML.

I have had some motivation recently to explore TOML input for demes, and code like this below "just works":

# requires python >= 3.11 (part of std lib)
# tomllib does not WRITE to toml.
import tomllib

# to read/write toml
import toml

# A toml writer mentioned in tomllib docs
import tomli_w

import demes

with open("example.toml") as f:
    input = f.readlines()

input = "".join(input)

print(input, type(input))
parsed_with_toml = toml.loads(input)
parsed_with_tomllib = tomllib.loads(input)

print(f"parsed with toml = {parsed_with_toml}")
print(f"parsed with tomllib = {parsed_with_tomllib}")

graph_from_toml = demes.Graph.fromdict(parsed_with_toml)
graph_from_tomllib = demes.Graph.fromdict(parsed_with_tomllib)

print(f"demes.Graph from toml: \n{graph_from_toml}")
print(f"demes.Graph from tomllib: \n{graph_from_tomllib}")

assert graph_from_toml.isclose(graph_from_tomllib)

# NOTE: tomllib does NOT support WRITING toml...
graph_as_toml = toml.dumps(graph_from_toml.asdict())

print(f"graph back into toml = \n{graph_as_toml}")

# ... but tomli_w does

graph_as_toml_2 = tomli_w.dumps(graph_from_toml.asdict())

print(f"graph_from_toml via tomli_w = \n{graph_as_toml_2}")

The contents of the example are:

time_units = "years"
description = "a description"
generation_time = 25

[defaults]
[defaults.migration]
rate = 0.25
demes = ["A", "B"]

[[demes]]
name = "A"
[[demes.epochs]]
start_size = 100

[[demes]]
name = "B"
[[demes.epochs]]
start_size = 42

I also suspect that we get JSON support for free in a similar way.

We could add functions like demes.loads_toml (or toml_loads??), etc., which would be very thin wrappers around code like the above. Or we can document somewhere that the asdict method is more useful than we've let on.

Thoughts?

grahamgower commented 2 months ago

Hey @molpopgen.

The Graph fromdict(), asdict() and asdict_simplified() methods are 100% intended for serialisation-format interoperability like this. TOML was mentioned as a candidate format in our initial discussions of Demes file formats, but I think the way of writing lists and doing nesting in TOML was seen to be more confusing than in YAML. From memory, RON (Rusty object notation) looked really approachable, but unfortunately lacked parsers outside of Rust. We could certainly include some example docs showing how to do serialisation to TOML or other things, as you've outlined above. We do already have JSON support built in (it's a proper subset of YAML after all, and the dump() functions accept a format="json" option too), but this is intended for applications where including a YAML parser would be a pain (e.g. reading a Demes file into SLiM).

In regard to including support for more formats natively in the library, I'd want to see the serialisation format used elsewhere for Demes files before considering it, to help justify the use case. I mean, we could add support for 20 additional formats with little code, but how useful would this be to users?

molpopgen commented 2 months ago

Thanks @grahamgower.

After a bit more exploring, I found that several of the demes-spec valid examples are not compatible with JSON or TOML as inputs. For JSON, the infinite start time is the issue. For TOML, the metadata example as foo: null which is out of spec. (For TOML, null values must be omitted.) All of that can be handled but it takes some work.

I was actually finding the TOML way of doing lists easier of late.

My use case is for input for a program that I'm working on. To support the current YAML input, it would be a string literal buried inside a file that is otherwise TOML. Writing it "inline" as TOML is easier.

But it sounds like the thing to do is to just add some text to the fromdict docstring.