lingpy / pybor

A Python library for borrowing detection based on lexical language models
Apache License 2.0
3 stars 1 forks source link

Update mattis #19

Closed LinguList closed 4 years ago

LinguList commented 4 years ago

Okay, a major refactoring done now with the neural code.

Basic features:

config.py provides classes with configurations, you can instantiate them and pass them as settings to the Neural class. We need to figure out if this is useful, but I have to admit that I am worried if there are so many params one just takes for granted, so one should be careful to not end up tuning things randomly.

entropies.py is the tensorflow major script, mostly untouched.

neural.py has now the neural data, a new Vocab class that also allows to translate:

>>> voc = Vocab(data)
>>> voc.translate(word)

Since we work with integers and strings, translation goes both ways: if you pass a bunch of integers, you can also translate this into the original alphabet.

I think this makes many code points clearer and also removes unneeded code. E.g., the functions for calling stuff were so complex that it took me a long time to figure out that it was all about translating in fact to the internal numerical represention. All gone now.

I also played a bit with the SVM, they perform better if you pass them trigrams. When testing with the train data, they receive 100% with tri-grams in prediction, so information of position is still something useful, I assume.

That is all for now, please review, @tresoldi and @fractaldragonflies. We can fine-tune things later, but at least I understand the code now much better than before.

LinguList commented 4 years ago

Yes. I was confused by the two models. I was thinking to propose two classes to make it easier to watch. Inheritance would capture major common features?

LinguList commented 4 years ago

In fact, John, do you think you could take over with the rest? Then we could merge this one for now, and you could correct my wrongs? Tomorrow, I suggest we reconvene via email and discuss the tests we want to do for the paper?

fractaldragonflies commented 4 years ago

Yes. Separate classes as for Markov with a common class taking up much of the load makes sense.

J.E.M.

On May 28, 2020, at 2:30 PM, Johann-Mattis List notifications@github.com wrote:

 Yes. I was confused by the two models. I was thinking to propose two classes to make it easier to watch. Inheritance would capture major common features?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

fractaldragonflies commented 4 years ago

Sure. I can do that. Some discussion in order re tuning and with tests. Talk tomorrow!!

Take care

J.E.M.

On May 28, 2020, at 2:43 PM, Johann-Mattis List notifications@github.com wrote:

 In fact, John, do you think you could take over with the rest? Then we could merge this one for now, and you could correct my wrongs? Tomorrow, I suggest we reconvene via email and discuss the tests we want to do for the paper?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

LinguList commented 4 years ago

Super, then I merge this now. @tresoldi, we're keen on your input tomorrow, during our discussion and can talk about this during our skype at 2pm.