It looks like the core of build_freq_string can be simplified to setting features to self.features or [feature] and then
if self.binarised:
for f in features:
for lang in self.data:
if self.data[lang][f] == "?":
continue
dpoint, index = self.data[lang][f], self.unique_values[f].index(self.data[lang][f])
all_data.append(index)
else:
for f in features:
for lang in self.data:
if self.data[lang].get(f,"?") == "?":
valuestring = "".join(["?" for i in range(0,len(self.unique_values[f])+1)])
else:
valuestring = ["0" for i in range(0,len(self.unique_values[f])+1)]
valuestring[self.unique_values[f].index(self.data[lang][f])+1] = "1"
all_data.extend(valuestring)
but I don't understand the "?" handling yet, so I won't do that right now. (Virtues! My concentration is bad right now. I should probably not program fragile things until it gets better.)
It looks like the core of build_freq_string can be simplified to setting
features
toself.features
or[feature]
and thenbut I don't understand the "?" handling yet, so I won't do that right now. (Virtues! My concentration is bad right now. I should probably not program fragile things until it gets better.)