lingpy / pybor

A Python library for borrowing detection based on lexical language models
Apache License 2.0
3 stars 1 forks source link

nltk john adapt - native/loan detection from both dual and native only models #12

Closed fractaldragonflies closed 4 years ago

fractaldragonflies commented 4 years ago

Detect functionality exposing predict functions. I am pulling now in order to do a sanity check on the direction I am working here.

I've attached the test output.

test_detect.output.txt

fractaldragonflies commented 4 years ago

I accept that we don't want to require numpy arrays as part of the arguments that will be exposed in the API. Likewise internally if there is no agreement specifying numpy array as the function argument.

Internally, to a function or class I disagree on that restriction. If we can do an operation with a numpy array in a manner much more direct than with vanilla Python, then that's better for more productive coding.

Neural net code weight -- it will surpass anything else we might consider adding.

fractaldragonflies commented 4 years ago

Read your point on time-performance using numpy arrays. This surprises me, but I suppose different environments and uses would lead to different performance. OK, well often I can substitute a list comprension for a numpy array operation. So, I'll try that route.

fractaldragonflies commented 4 years ago

Converted entirely to the 3 column file format. BTW, there seems some difference between the dev data files (training and testing) and the new lexibank format. At least in how they appear when I print(table[:10]) versus print(testing[:10).

Moved MarkovWord based on nltk to its own file... strictly an adapter for nltk access to calculate entropies which are used by the DualMarkov and NativeMarkov classes in markov. [I may combine these into one class still before merge.]

Several test files added/changed. @LinguList in evaluate.py, I reversed positions of fn and fp. Please check. Coverage was only 70% on the markov_nltk file as I only checked the 'kni' method.