lmaurits / BEASTling

A linguistics-focussed command line tool for generating BEAST XML files.
BSD 2-Clause "Simplified" License
20 stars 6 forks source link

Set AncestralStateLogger value input for better output #178

Closed Anaphory closed 6 years ago

Anaphory commented 6 years ago

When we create an ASL, we should set its value input in order to make the column assignments more transparent.

This would require, after https://github.com/lmaurits/BEASTling/blob/55f5ef6d65520c6731e247e4f9a5454bc7465d11/beastling/models/basemodel.py#L439 something like

                distribution.attrib["value"] = " ".join(["{:}_{:d}".format(f, i) for i in range(featurepatterns)])

but

Anaphory commented 6 years ago

Do we have featurepatterns = self.valuecounts[f], or do I need to do corrections to that number for ascertainment etc.?

lmaurits commented 6 years ago

Sorry, what is featurepatterns supposed to be?

Is what we are after here user-friendly name for each feature, or one for each possible value of each feature? My recollection of this issue necessitated the former, which is easy (there's a convention to get a unique name by concatenating the model and feature name with a separating colon - probably this should actually be formalised into a method).

lmaurits commented 6 years ago

Oh, right, you're thinking about the case of binarised multistate data.

lmaurits commented 6 years ago

Ugh, you need to do a correction to self.valucounts[f] for ascertainment correction, and the number you need to add is thrown away after it is generated, which is annoying...

lmaurits commented 6 years ago

Now the question is whether the number of extra columns is currently computed before or after you are wanting to access it...

lmaurits commented 6 years ago

Thankfully, well before. I'll commit a change shortly to store that number so you have easy access to it.

lmaurits commented 6 years ago

The value you want is now featurepatterns = self.valuecounts[f] + self.extracolumns[f].

lmaurits commented 6 years ago

Btw, self.unique_values[f] should contain all the possible values of the feature, in the same order as they are turned into columns when binarised. So instead of calling the individual binary features 1, 2, 3, whatever, you could, in the case of the non-dummy columns, give them their "actual" value.

Anaphory commented 6 years ago

Actually, for BaseModel, I should set value to the feature name, but for binarized models I should create the right number of columns, shouldn't I? Which means I should add some test cases for ASR with multi-state models and ascertainment on and off etc.

lmaurits commented 6 years ago

It's possible to do ascertainment correction for non-binarised models, too. But, yes, when there is a column in the reconstruction log which corresponds directly to a feature, we should definitely use that feature name.

Anaphory commented 6 years ago

I have now added a

def pattern_names(self, feature):
    if self.ascertained:
        return ["{:}_dummy{:d}".format(f, i) for i in range(self.extracolumns[f])] + [feature]
    else:
        return [feature]

which is overloaded by BinaryModel, in order to do this. Now off to add some ASR tests.

lmaurits commented 6 years ago

That looks about right. Make sure you pull once more before running the tests, my hasty addition of self.extracolumns wasn't done properly and required a fix.

Anaphory commented 6 years ago

Yes. I noticed and decided to create a defaultdict, as well.