lingpy / pybor

A Python library for borrowing detection based on lexical language models
Apache License 2.0
3 stars 1 forks source link

Neural modules with refactoring and used of Attrs... #20

Closed fractaldragonflies closed 4 years ago

fractaldragonflies commented 4 years ago

Following Mattis' initial refactoring of Markov and introduction of Attrs and other changes for both Markov and Neural, I've refactored the Neural modules, corrected a few glitches, and updated testing. Significant refactoring is in splitting Neural into Native and Dual components; and in splitting Entropies (was call neural_tf in a previous incarnation) into Recurrent and Attention components.

Config following initial work by Mattis is in data classes where RecurrentSettings and AttentionSettings classes are the most distant from BaseSettings. Settings is a parameter for all high level modules and the same settings reference (if using Attention or Recurrent) can be transmitted from top level Neural modules to lower level data or neural entropy modules.

Still lacking are reworked examples. Also I would like to include a notebook example.

Maybe separately I intend to verify similar performance to my original draft paper (using original WOLD representation), and to my subsequent collection of tables using FormChars, Token segments, and SCA encodings.

fractaldragonflies commented 4 years ago

Will make suggested changes today. Since simple function will use namespace of the module, not a problem! [Remnant from my Java days.] Some related issues that I would value advice/conversation on.
Highlights are:

  1. Signature for neural and entropies includes testing. In neural network studies, we often use the [fit/val]/test division where train=[fit/val] is part of development and fit, but test is more or less sacrosanct and only used after all work is done [in ideal world at least]. Useful to build the entire vocabulary of symbols, but otherwise this could be left for a separate function of the class.
  2. Attention method is still 'exploratory'. I haven't shown it to be superior to the recurrent method without attention, only that it has substantially reduced entropies. [But they don't necessarily result in improved borrowing discrimination.]
  3. Before we do our extensive objective study of our 'routine' methods on all of WOLD, I want to repeat the previous trials I did with my small selection of languages as reported previously. I ought to be able to duplicate prior performance. [To duplicate for FormChars I would need to do some cleaning to get rid of (), upper case, etc.; maybe just for Tokens then!]
  4. My other ideas for improvement, i.e, 'exploratory', are probably set aside for now to just finish this version.
fractaldragonflies commented 4 years ago

Just added and example with a few changes in code otherwise. Could I revert the merge and then remerge?

LinguList commented 4 years ago

No, but you can just push it without making a PR. Or you just make another PR and add it withough review ;)