lingpy / pybor

A Python library for borrowing detection based on lexical language models
Apache License 2.0
3 stars 1 forks source link

Tweaks to better conform with our standards, also as result of unit and regression testing. #21

Closed fractaldragonflies closed 4 years ago

fractaldragonflies commented 4 years ago

This was supposed to be a quick (weekend through Monday) update to neural code to bring code into line with some of naming and coding as well as do more testing with concomitant changes. It became a longer term effort to resolved the discrepancy between previous notebook performance and current performance. The discrepancy is now explained as due to data feeding the models -- where the data for training was not independent of the data for testing. Some tweaks were added during this process as well giving the code the same options as in the notebook case, but via a much more disciplined mechanism of settings.

While time consuming, it was illustrative of the gain we've made in code standard by migrating to Python scripts in separate modules versus notebook classes, functions, and global variables. Now every aspect of the analysis is controllable by the code.

Not included are notebook examples. Not yet. They are valuable in showing how to do analysis via the various modules -- especially showing graphs that correspond to the entropy distributions for native versus borrowed words. Such graphs and corresponding statistics are important in understanding why monolingual borrowing detection by entropy works -- better for some than for other languages.

LinguList commented 4 years ago

@fractaldragonflies, I merge now: if I find time, I can then add the code for the fake borrowings.

fractaldragonflies commented 4 years ago

Thanks Mattis. I wasn't sure about the protocol for the team whether I could/should have done the merge (after approval) or if prerogative of the team leader. Although I guess it makes sense for the author of the pull request in cases where there are conflicts in the merge to be resolved.