lmaurits / BEASTling

A linguistics-focussed command line tool for generating BEAST XML files.
BSD 2-Clause "Simplified" License
20 stars 6 forks source link

It is far too easy to mistype properties. #200

Open Anaphory opened 6 years ago

Anaphory commented 6 years ago

It happens very often to me that I mistype the name of a property. shared_params instead of share_params, exclude instead of exclusions, that sort of thing.

Can we easily change some data structures such that the config file entries are popped when they are first accessed, and warnings are issued if there are properties that were not accessed?

lmaurits commented 6 years ago

This happens to me a lot, too (especially typing monophyly instead of monophyletic or vice versa). Which probably means it will happen to everybody else, too.

I always imagined the solution to be not warning if properties were unused (since they often might not be if defaults are okay), but rather changing the current behaviour of silently ignoring unknown properties in configs.

Anaphory commented 6 years ago

That's what I meant: I meant properties manually specified in the config file that are not used by the XML construction should lead to a warning.

lmaurits commented 6 years ago

Argh, right, sorry, I read you backward.

I definitely approve of this in theory, I don't know how easy it will be to implement in practice.

lmaurits commented 5 years ago

I've partially fixed this - for [admin], [MCMC] and [languages] unknown options will trigger an error during Configuration.process(). That was surprisingly easy. Doing it inside [model] will require a little more thought, as there are common options that Configuration knows about and model-specific ones it doesn't. I'm sure it's very doable, it's just not something I could bang out with minimal effort like the other sections was.

xrotwang commented 5 years ago

Sounds a bit as if models should implement some sort of ConfigProvider interface, which specifies which config options are available.

Luke Maurits notifications@github.com schrieb am Mo., 29. Juli 2019, 16:15:

I've partially fixed this - for [admin], [MCMC] and [languages] unknown options will trigger an error during Configuration.process(). That was surprisingly easy. Doing it inside [model] will require a little more thought, as there are common options that Configuration knows about and model-specific ones it doesn't. I'm sure it's very doable, it's just not something I could bang out with minimal effort like the other sections was.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lmaurits/BEASTling/issues/200?email_source=notifications&email_token=AAGUOKHYXPSADZH4YIRHCJ3QB33RXA5CNFSM4EQH2RI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3A3AZA#issuecomment-516010084, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGUOKFGFTYDIGMMHMJSW4LQB33RXANCNFSM4EQH2RIQ .

xrotwang commented 5 years ago

Maybe we should use @attr.s as intermediate objects for config data, i.e. read config sections into @attr.s classes, using attr.ibs for each option. This will give us

xrotwang commented 5 years ago

E.g. the admin section could be parsed into

@attr.s
class Admin(object):
    basename = attr.ib(default='beastling_test')
    log_all = attr.ib(converter=bool, default=True)
    log_every = attr.ib(converter=int, default=100)
    ...

Note that this also provides a standard place for

xrotwang commented 5 years ago

I see we are already going (a bit) into this direction here:

class Distribution(collections.namedtuple(
        "Distribution", ["offset", "dist", "param"])):
    @classmethod
    def from_string(cls, string, context=None, is_point=False, **kwargs):
        """Create a Distribution object from a prior description string.

        """
        offset, dist, param = parse_prior_string(
            cs=string, prior_name=context, is_point=is_point)
        return cls(offset=offset, dist=dist, param=param, **kwargs)

    def generate_xml_element(self, parent):
        add_prior_density_description(compound_distribution=parent,
                                      distribution=self)

    def mean(self):
        if self.dist in ("normal", "point"):
            return self.offset + self.param[0]
        elif self.dist == "lognormal":
            return self.offset + math.exp(self.param[0])
        elif self.dist == "uniform":
            return self.offset + sum(self.param) / 2.0
        else:
            raise NotImplementedError
lmaurits commented 5 years ago

Have't had time yet to read up on this library you're proposing using or following exactly what you want to do, but since we are talking about things like reading config sections and default values, I'll share this here and now and you can ignore it if it's not relevant.

One bit of feedback I got at the workshop from the more experienced BEAST users was that there really needs to be a way to specify either fixed parameter values or parameter priors for a lot of components. This came up specifically in the context of tree priors - we support the BirthDeath prior now, which is a very good thing, but people can't do things like specify their sampling proportion if they know it, or put a prior on their death rate.

If every tree prior, clock model, subst model, etc. knew somehow what its parameters are, then we could do a lot of good things, not only automatic ID generation (preventing recent problems and also increasing consistency/readability of IDs), but I have been daydreaming of some kind of high-powered config file syntax like:

[clock default]
type = relaxed
mean = 4.2    # Specify a fixed value for a param, which would disable estimation 
mean.initial = 4.2    # Specify an initial value, but still estimate
mean.prior = normal(4.2,1.0)   # Specify a prior for a parameter, reusing the distribution-parsing machinery from calibrations
stddev.prior = exponential(1)
xrotwang commented 5 years ago

I think "knowing somehow what its parameters are" is the beauti template way, right? The added value of BEASTling - as far as I can tell - is grouping these into relevant, understable config options. So I'd argue that not simply passing full control over the XML to the BEASTling user is a good thing. That said, reading a config snippet as above into an object like below would seem pretty transparent, I guess:

@attr.s
class RelaxedClock(object):
    mean = attr.ib(converter=lambda d: Mean(**d))

    @classmethod
    def from_config(cls, cfg):
        # split options names on ".", pass groups of values with the same prefix to initialise matching attributes
        cls(**{k: dict(opts) for k, opts in itertools.groupby(sorted(cfg.items()), lambda i: i[0].split('.')[0])})
xrotwang commented 5 years ago

I'd also propose to move the sections admin, MCMC and languages into such objects, and give Configuration corresponding attributes. While this may affect many places in the code, I think reducing the number of attributes on Configuration would be helpful. And again, since the admin object would have attributes and IDE cn know of (i.e. not created via setattr), typos would be reduced, and access when typing code would even be quicker, despite longer names a la config.admin.basename.

lmaurits commented 5 years ago

Heck, even without an IDE I think I'd rather type config.admin.basename repeatedly than config["basename"].

xrotwang commented 5 years ago

So maybe these core, well understood config sections could be a good place to start experimenting - and then maybe expand to more complex clocks and models and whatnot?