Closed LinguList closed 4 years ago
I would say: please let us work in this direction, to make the code cleaner, library based. we can even add a command-line interface that handles parameters. But the important part would be: let us determine what the core tests are that we want to do (e.g., compare word distributions) and see that we do this with the minimal number of effort from heavy third-party libraries.
Note that the setup.py
will allow for full replicability, as we can add all thirdparty libs there, but we should reduce them (ideally not using nltk, but @tresoldi and @fractaldragonflies, you need to see if the NgramModel by @tresoldi is enough to account for the markov experiments).
Update, I just added command line functionality:
mobor plot_entropy --language=Swahili --file=swahili-entropy.pdf --sequence=sca
will plot the entropies for sca, and the like.
In this spirit, I think, we should carry on doing the whole code.
@tresoldi, if you look at the markov code by @fractaldragonflies : are there things that nltk offers which we can't handle with your ngrams? If not: how difficult would it be to reimplement them, or how important would these be? I think I'd suggest to add them to the new wrapper (so we don't add them to lingpy,which woudl require more testing), so we can make our experiments now with this new library.
I will leave this open until @fractaldragonflies had a look, you can in fact just merge then, and see in which way more code could be integrated.
The ngram functions in lingpy were written as an extension of the ones in nltk: everything in there should be compatible (perhaps with minor differences in calling paradigm and things like that). So much that, at the time, I was confirming the output with the one provided by nltk.
Granted, nltk might have changed in these two years, but it seems everything should work out of the box. @fractaldragonflies , could you confirm? The lingpy functions are pretty well documented.
A bit overwhelming at first. And a bit at second glance as well. Also I will need a bit of schooling with respect to GitHub interaction -- I think you are awaiting some action/approval for my part.
Comments on Mattis proposal results:
Comments on how it should work:
With respect to supporting other models, this becomes a more general theme if we consider the neural network model as well. For discussion.
I will merge this then for now. So you should run
$ git pull
in the folder, and will have the updates.
The plotting function can be modularized more. It is only a first example to get things running. The advantage is: it will check for dependencies now, so it is much more replicable, and we really separate data and code, will be able to add other lists in the future, etc.
@tresoldi, please, can you look into the question of normalization on laplace, etc? I didn't find the one that @fractaldragonflies used.
@fractaldragonflies and @tresoldi, please check this example illustrating how I imagine that this code package should work.
The result is:
1 convenient loading of a wordlist (see
mobor.data.Wordlist.from_lexibank
)2 simple extraction of a table
3 a new class
Markov
that can retrieve data from a wordlist4 a specific plotting module that only does this: plotting data