Open fractaldragonflies opened 4 years ago
@tresoldi, I trust you in looking into this, but let me know if you want me to have another look.
I resolved conflicts with the cross_validate_models, but then when I exited without pulling merge, it discarded them. I suppose they are supposed to be done together. So, I'll wait for your review and if OK to merge, I'll resolve at the time.
The entropies.py changes and supporting changes in neural.py (NeuralData and Generator classes) may be more aggressive than we wish to submit with the final paper. They include the solution of the memory leak and progress update problem, the really important problem to fix, but added work I was doing to 'improve' the model to sparse-crossentropy with some related model and fitting changes as well, while leaving the config.py largely unchanged. An alternative to merging this branch would be for me to make a different branch 'finish-PLOS-paper' or similar without the sparse-crossentropy and related stuff.
Not sure of protocol here. Tiago, we can discuss, and decide whether to abandon this merge request.
@fractaldragonflies I agree, the changes might be too aggressive, even though they are good and desirable.
There are many alternatives here, in terms of branch and releases. The two best are, in my opinion:
I would prefer the first alternative, but the second is easier, quicker, and less bound to raise problems. Thus, my vote is to keep these changes "reserved", for after the submission.
BTW, my suggestion implies making sure the john-post-plos
is in sync with master
. I have just fixed the three simple conflicts that were there.
I would explicitly ask not to add any new code with new features before having resubmitted the publication. Let's just stick on this for now, so we go with releasing 1.0 ASAP, and will reference 1.0 in the release.
However, please note that the reviewer asked us to officially publish the German list which I created, so this list should also be lexibankified, and it should be mentioned as the source for the repo.
So I suggest: we need to do a thorough code review next week to see how well all this looks and maybe wait with the release until I found time to either lexibankify my list, or to submit my list to IDS.
New code could then stay in this other branch.
Do you have the original list at hand? A proper lexibank dataset would need concept mapping and values, which are missing from the raw Python table: https://github.com/lingpy/pybor/blob/master/src/pybor/dev/data.py
Yes, they are missing. I have the list, I mean, I typed it. I already indicated that I'll make a lexibank dataset out of it, as part of our response letter (where I assigned this to me). For the library, we can just say we use the list from this lexibank dataset (or IDS contribution) inside of pybor.
Changes I had made after initial submission. Resolved the memory leak by clearing tensorflow workspace after each neural run. Need to call from each example doing multiple language examples - so cross validation and simulated borrowings. Some changes in file output, initially thought to be related to memory problem, retained even when clear workspace was adopted. Added progress updates for the cross-validation and fake borrowing examples, since neural network takes substantial time if done over the entire WOLD corpus. Dropped 'oversampling' function as not useful and never documented. Changed to sparse representation and corresponding crossentropy for segments, without impact on results. No new functionality, other than progress updates.