Separate data from code and add more feature systems

LinguList commented 4 years ago

There are more feature systems out there:

phoible
mielke 2008 (pbase)

etc.

At the moment it is not clear what this library is: a library for one feature system covering a couple of sounds? Why were different features chosen? Etc., so it might be useful to define this properly. One could also subsume the feature matrix functions under pyclts, given that it is based on pyclts anyway, and given that pyclts OFFERS access to feature systems THROUGH the transcription data (!).

tresoldi commented 4 years ago

It is also intended for manipulation of segmental/distinctive features (in an algebraic sense, from feature geometry), but its real purpose is just providing a way of accessing data without too much boilerplate code. The goal is to interface with the matrix I am developing, but still allowing alternatives ones if they are placed in resources/ (something that, technically, already works). But I agree, this is not clear as it is: this is not mentioned in the README file, no other matrix is distributed and, for this obvious reason, it is not tested. All issues I'd need to take care in future versions.

I plan to include at least a couple of other models, as a demonstration at least, such as one from SPE and Mielke. Phoible could be included, but it is a model in a different sense and would not fit too well (it is more of a binarization of descriptors, allowing bivalence, than a model with an implied geometry).

A detailed description of features and ideas is important, but, besides this being a work-in-progress, it would better fit a paper than a blog post -- especially considering that some of my decisions are controversial/exploratory and I did not necessarily make my mind on them yet (like allowing geometries with multiple parents, e.g. for anterior). Better integration with pyclts is a plan/wish and I will try to keep signature calls similar, but it is something that can only be considered in the future, when and if this model gets serious (which I suppose would involve actually publishing it, or at least publishing something that uses it). As it is, it is really just a helper, single-file library for personal experiments, but on PyPI because it can save a couple of hours to people with similar goals.

LinguList commented 4 years ago

Well, if it is for the integration of feature systems, I don't understand why something like phoible would not work, but I don't need to know it, to be honest, as it is not my major interest, I just think it would be good to make things transparent and start from what's there, to also feed back in this way into the maintenance of pyclts, and CLTS, which has been largely abandoned.

But you still plan to write the blog post, right?

tresoldi commented 4 years ago

Phoible would work, but would not add much, because its model is more like "is this sound X or not?", without an explicit hierarchy of features. It is more "serious", or "less adventurous", which is why an "equation" like "labial+dental=labiodental", the kind of thing this model wants to experiment with, do not make much sense there, especially as we have pyclts already.

Models like the one(s) of Mielke, on the other hand, would make a lot of sense.

CLTS needs more attention and I am at fault, at least this small project helped in identifying more issues (one case from yesterday: this, which is using a private area Unicode from SIL, when it should be kʟ̝̊). Hopefully this gets me back on track for that.

I finished drafting the post today, including a small note on this library, and will review it tomorrow, when today's writing is not still so fresh in mind. We can publish it any day you prefer from Wednesday/Thursday (or much later, of course, no urgency on my side).

tresoldi / distfeat

Separate data from code and add more feature systems #2