Closed chirila closed 6 years ago
Note to Will - this is for testing two things:
Running these now on grace.hpc.yale.edu.
I've also written a script extract_xml.py
that should extract plain text from Perseus and Bible Corpus-formatted XML files.
An observation:
bible.LA.txt has ~9000 words, whereas bible.GK.txt has ~3000. We'll see how this affects the alignment.
Vanilla run options for MUSE alignment:
python unsupervised.py --src_lang la --tgt_lang gk --src_emb ../models/bibleLatin.vec --tgt_emb ../models/bibleGreek.vec --n_refinement 5 --emb_dim 100 --dis_most_frequent 2804 --dico_max_rank 0 --epoch_size 100000
Things to try:
dis_most_frequent
(the number of words we try to align in each language) to 1000. 2804 is the total number of words in the Greek Bible.Running again, but this time aligning Greek into Latin and limiting to the 100 most common vocabulary items in each language:
python unsupervised.py --src_lang gk --tgt_lang la --src_emb ../models/bibleGreek.vec --tgt_emb ../models/bibleLatin.vec --emb_dim 100 --dis_most_frequent 100 --dico_max_rank 0 --epoch_size 100000
Training is underway. So far, the loss numbers look more monotonic.
How to call kalign for this run:
python kalign.py --bible --text gr-la-100 --model bibleLatin --src_lang gk
Update: did not work with dis_most_frequent
=100. Trying again with 1000. Will also see what happens when I don't set epoch_size
.
Result: After trying many different settings of hyperparameters, I am unable to get any meaningful alignment between the Latin and Greek bibles. All results in https://github.com/viking-sudo-rm/voynich2vec/tree/master/alignments/bibles
Limiting to most frequent in each language may be causing problems, since Greek has articles and Latin doesn't, and Latin has more case morphology than Greek (to Greek has more common prepositions).
On Fri, Jun 1, 2018 at 12:37 AM, Will Merrill notifications@github.com wrote:
Result: After trying many different settings of hyperparameters, I am unable to get any meaningful alignment between the Latin and Greek bibles. All results in https://github.com/viking-sudo-rm/voynich2vec/tree/ master/alignments/bibles
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/viking-sudo-rm/voynich2vec/issues/4#issuecomment-393755499, or mute the thread https://github.com/notifications/unsubscribe-auth/AP8oR2Ij8aFn4h2iLWtuLzD0BPMj3_N7ks5t4MT3gaJpZM4UF1h- .
--
Claire Bowern Professor, Director of Graduate Studies Chair: Yale Women Faculty Forum (wff.yale.edu) Department of Linguistics New Haven, CT 06511
Can you do a bunch of "self" alignments for these languages? If we end up with consistent patterns in what types of morphology align, that would be a way to mae some guesses about Voynich morphology.
On Fri, Jun 1, 2018 at 9:38 AM, Claire Bowern claire.bowern@yale.edu wrote:
Limiting to most frequent in each language may be causing problems, since Greek has articles and Latin doesn't, and Latin has more case morphology than Greek (to Greek has more common prepositions).
On Fri, Jun 1, 2018 at 12:37 AM, Will Merrill notifications@github.com wrote:
Result: After trying many different settings of hyperparameters, I am unable to get any meaningful alignment between the Latin and Greek bibles. All results in https://github.com/viking-sudo-rm/voynich2vec/tree/master/ alignments/bibles
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/viking-sudo-rm/voynich2vec/issues/4#issuecomment-393755499, or mute the thread https://github.com/notifications/unsubscribe-auth/AP8oR2Ij8aFn4h2iLWtuLzD0BPMj3_N7ks5t4MT3gaJpZM4UF1h- .
--
Claire Bowern Professor, Director of Graduate Studies Chair: Yale Women Faculty Forum (wff.yale.edu) Department of Linguistics New Haven, CT 06511
--
Claire Bowern Professor, Director of Graduate Studies Chair: Yale Women Faculty Forum (wff.yale.edu) Department of Linguistics New Haven, CT 06511
For organizational reasons, closing this issue and opening another one with your suggestion.
http://legacydirs.umiacs.umd.edu/~resnik/parallel/bible.html - includes links to languages Paper on another project; includes links: http://www.lrec-conf.org/proceedings/lrec2014/pdf/220_Paper.pdf And another one: https://link.springer.com/article/10.1007%2Fs10579-014-9287-y And actual data: https://github.com/christos-c/bible-corpus/tree/master/bibles