TheCakeIsAPi commented 5 years ago

ARTIQ Feature Request

Problem this request addresses

The group delimiter is '.' for datasets and allows multiple levels. It is '/' for hdf5 and allows multiple levels. For arguments it must be specified with the 'group=' parameter and only allows one level deep. In the namespace no delimiters are allowed because both '.' and '/' break the python variable naming scheme, and getattr is not allowed in the kernel so you can't call it from the safety of a string, e.g. getattr('globals.DDS.Ba_493.frequency').

We are mapping our experimental parameters between all 4 of these. My stopgap solution has been to use '.' in datasets, to automatically convert '.' to '__' (double-underscore) when pulling this into the namespace, to automatically use the 2nd group ('DDS' in the example above) for argument grouping, and to allow datasets in the hdf5 file to keep the '.' and thereby remain (sadly) ungrouped.

Describe the solution you'd like

Assuming double-underscore is a reasonable python-safe delimiter: Change the group delimiter to double-underscore for datasets. Use double-underscore as an automatic group delimiter for multi-level grouping in arguments. Automatically convert the delimiter to '/' to preserve grouping in the HDF5 hierarchy.

Additional context

I've created a get_dataset_db method, patterned after get_device_db, which allows me to scrape the dataset list and automatically import anything that starts with 'globals.' which is how I'm storing experiment parameters. Without this I would be specifying every parameter name in code and this issue would be less relevant, although it would still be nice to have consistency and to preserve grouping in the hdf5 files.

jordens commented 5 years ago

I don't like that and I don't see a clear benefit from unifying argument hierarchy, dataset hierarchy, python object/attribute, and HDF5 hierarchy. They are all different things.

We are not tied to HDF5 as the storage format. It could be something else and we hardly make significant use of it. I don't see HDF5 path separators as an indicator for what's good or required. We also don't map dataset hierarchy or argument hierarchy to HDF5 hierarchy at all.
Breaking Python variable naming scheme with the dataset hierarchy separator is intentional. Otherwise you end up with monstrous variable names like the one you propose: self.globals__DDS__Ba_493__frequency). Instead retrieve the hierarchical datasets parametrically in host code and store them as idiomatic variable names, maybe something along the lines of (from memory, untested):


class Servo(HasEnvironment):
    def build(self, name):
        self.gain = self.get_dataset("servo.{}.gain".format(name))
        self.y = 0.

    @kernel
    def update(self, x):
        self.y += self.gain*x

class Exp(EnvExperiment):
    def build(self):
        self.a = Servo(self, name="a")
        self.b = Servo(self, name="b")

    @kernel
    def run(self):
        self.a.update(1.3)
        self.b.update(3.)

You could even automate that prefix scheme to save some typing.

Snake case is the dominant style for attribute and variables in Python, HDF5, file names, influxdb names, etc. Using double underscore for something else inside variable names makes that unreadable and also hard to parse both mentally and computationally.
Visual grouping of arguments is separate again as there is limited use in nesting deeply. Not allowing deep nesting was a conscious decision. Also argument groups don't map to attribute or variable names at all (but you can make them parametric in the same style as the example above).
Retrieving the entire dataset db and shoe-horning into attributes is not recommended. I haven't felt the need to do that. I don't know why you think that's a good idea.

TheCakeIsAPi commented 5 years ago

Retrieving the entire dataset db and shoe-horning into attributes is not recommended. I haven't felt the need to do that. I don't know why you think that's a good idea.

Datasets seem like the ARTIQ resource most suitable as the as the primary source of parameters shared across experiments, but the arguments GUI is more full-featured than the datasets GUI, so I use arguments to create/modify the parameters, but use the datasets to store default values that need to persist across argument rebuilds. I have a base_experiment.py which has arguments for every one of these global parameters, wherein I scrape the datasets for the default values, populate the arguments, and then save back to datasets if changes were made either from the arguments GUI or in the experiment code. Individual experiments have access to all the globals but only have arguments for things specific to that experiment. It seems to be a pretty good system. If the dataset GUI had all the features that the arguments GUI has then I probably would not be using the arguments at all for my globals.

dhslichter commented 5 years ago

the arguments GUI is more full-featured than the datasets GUI

I'm curious, what in particular from the arguments GUI do you wish were available in the datasets GUI? Spinboxes? Ability to display with units, or scaled? It seems to me that really what you would like would be to add functionality to the datasets GUI to allow manipulation more like what is available for arguments, rather than jumping through all these hoops with changing separators and the like. Am I understanding correctly?

TheCakeIsAPi commented 5 years ago

@dhslichter correct. Mostly limits and units/scale, and not having to open up a separate little window to change each value.

But it also still seems to me that since all of these things are saved to hdf5, then all of them should have matching structure. I'm not sure I agree @jordens that these are different things which ought to be treated differently, since the built-in ARTIQ functionality of archiving already forces them into the same space.

dnadlinger commented 5 years ago

But it also still seems to me that since all of these things are saved to hdf5, then all of them should have matching structure.

They aren't actually, though – arguments are just serialized as JSON as part of the expid.

In general, I agree that the argument/dataset mechanism on its own is not very handy for many more complex experiments, but that is mostly because of the limits to composability and exploratory flexibility the use of globals like this entails, not because of the different hierarchical naming conventions. In other words, I am not sure any proposed change to the naming scheme would improve the situation at all – surely, as long as you are accessing an opaque key-value store, whether you are typing globals.foo.bar or get_dataset("foo.bar") is pretty much the same? (Define an alias and make it d("foo.bar") if the extra characters is what bugs you about this.)

In particular,

Mostly limits and units/scale

wouldn't be addressed by this as long as the dataset DB is a plain key-value store.

In ndscan, parameters are both rich (have unit/scaling/limits/… metadata attached to them) and scoped to a particular code fragment, and can default to a dataset. At the same time, parameters are discovered throughout the entire fragment tree, and you can any number of them to the experiment window on demand to override/scan them. That design seems to be working out pretty well for us. It doesn't add any nicer controls for editing the dataset DB, but with the option to set a dataset from the cursor/context menu in the plot applet (plus automated fits), manually having to type in a number is pretty rare.

TheCakeIsAPi commented 5 years ago

@dnadlinger I'm not looking for shorter variable names. In fact self.globalsDDSBa_493__frequency which @jordens called a monster is exactly what I am using. I am looking to avoid having multiple names for the same thing.

It mostly seems like you guys are getting along fine without having these spaces map, but it doesn't seem like it would get in the way of anything you do either. If you wanted to use dataset names with '.' in them and then different aliases without periods in them in the code, you'd still be able to do that. But I want to use the same names and same hierarchy everywhere, and currently I cannot do both of those things.

jordens commented 5 years ago

Dataset hierarchy separator

The initial proposal is to use __ instead of . as hierarchy separator. Let's restrict this issue to that. I am still unsure what issues that solves apart from slightly simplifying your very special way of retrieving datasets and generating attributes for them. Feel free to file new issues about the rest.

Atribute names

I find it extremely useful to strip the full path of the dataset to a locally meaningful attribute name. Otherwise schemes like the one above and most structured code organization schemes that I can think of would not work at all. It's a common pattern everywhere I look (in file names, in variables, in attributes) to not use the full path at each site but to shorten it and have context hold the rest. Having multiple paths for the same name is completely normal. Your approach also breaks lots of other things:

you loose the meaning of a dataset appearing in the HDF5 file (it being used)
the tracking of how many times a dataset was retrieved is broken
you don't get to choose when a dataset is retrieved
you can't set a dataset anymore without running into warnings and consistency problems
you have no way of reusing code for different sets of datasets
you can't easily retrieve datasets in classes that are owned by your top level experiment, or/and you loose the consistency between them

I am surprised you didn't run into any of that. But in any case it's obviously your choice how you organize your code and your datasets.

Metadata on datasets

Adding the metadata that we have for arguments (limits, units, scales, resolutions, uncertainties, steps, scan parametrizations, timestamps) to datasets is a different proposal. Open another issue with a detailed proposal if you are interested.

HDF5 mapping

The fact that multiple different sources of information end up in the HDF5 file doesn't imply that they all need to formatted and structured the same way. What would be the reason/the advantage/the cost? Is it even possible? How do you serialize all of the following in the same efficient and generic structure while keeping it editable and introspectable?

a generic opaque python object somewhere down the attribute hierarchy
a many-MB large structured numpy array (maybe with structured metadata)
a randomized multidimensional scan argument
a None value

m-labs / artiq

Consistency of hierarchy between datasets, arguments, namespace, and hdf5. #1347

ARTIQ Feature Request

Problem this request addresses

Describe the solution you'd like

Additional context

Dataset hierarchy separator

Atribute names

Metadata on datasets

HDF5 mapping