Closed Anaphory closed 6 years ago
Do we have featurepatterns = self.valuecounts[f]
, or do I need to do corrections to that number for ascertainment etc.?
Sorry, what is featurepatterns
supposed to be?
Is what we are after here user-friendly name for each feature, or one for each possible value of each feature? My recollection of this issue necessitated the former, which is easy (there's a convention to get a unique name by concatenating the model and feature name with a separating colon - probably this should actually be formalised into a method).
Oh, right, you're thinking about the case of binarised multistate data.
Ugh, you need to do a correction to self.valucounts[f]
for ascertainment correction, and the number you need to add is thrown away after it is generated, which is annoying...
Now the question is whether the number of extra columns is currently computed before or after you are wanting to access it...
Thankfully, well before. I'll commit a change shortly to store that number so you have easy access to it.
The value you want is now featurepatterns = self.valuecounts[f] + self.extracolumns[f]
.
Btw, self.unique_values[f]
should contain all the possible values of the feature, in the same order as they are turned into columns when binarised. So instead of calling the individual binary features 1, 2, 3, whatever, you could, in the case of the non-dummy columns, give them their "actual" value.
Actually, for BaseModel
, I should set value
to the feature name, but for binarized models I should create the right number of columns, shouldn't I? Which means I should add some test cases for ASR with multi-state models and ascertainment on and off etc.
It's possible to do ascertainment correction for non-binarised models, too. But, yes, when there is a column in the reconstruction log which corresponds directly to a feature, we should definitely use that feature name.
I have now added a
def pattern_names(self, feature):
if self.ascertained:
return ["{:}_dummy{:d}".format(f, i) for i in range(self.extracolumns[f])] + [feature]
else:
return [feature]
which is overloaded by BinaryModel
, in order to do this. Now off to add some ASR tests.
That looks about right. Make sure you pull once more before running the tests, my hasty addition of self.extracolumns
wasn't done properly and required a fix.
Yes. I noticed and decided to create a defaultdict
, as well.
When we create an ASL, we should set its
value
input in order to make the column assignments more transparent.This would require, after https://github.com/lmaurits/BEASTling/blob/55f5ef6d65520c6731e247e4f9a5454bc7465d11/beastling/models/basemodel.py#L439 something like
but
f
probably needs some cleaning to be more identifier-like, a function I would also be happy to use in some other placesfeaturepatterns
, which should include the number of excluded patterns, because ASL is dumb?