silnrsi / libreoffice-linguistic-tools

LibreOffice Linguistic Tools for technical write-ups of lesser-known languages.
7 stars 2 forks source link

Writing glosses? #15

Closed mcepl closed 5 years ago

mcepl commented 5 years ago

Hi,

I am thinking about moving my wife (PhD in linguistics, oriented more towards syntax/semantics than morphology/phon{etics,onology}) from LyX/LaTeX towards LibreOffice and the obvious problem is missing support for glosses and syntax trees. Is your package something which could work as an replacement for glosses as produced by covington package (pages 7 and following)?

gloss

If yes, could you please add some example how to do it to the manual? Thank you.

(and yes, this could be possibly a duplicate of #8, but I am not sure I understood that issue well enough).

jkornelsen commented 5 years ago

Hello Matej,

I am not familiar with covington. However, the example you showed is identical to the output produced by the LingTools add-on. Enter data in SIL FieldWorks (or similar software such as Toolbox) and then use the LingTools add-on to insert grammar (i.e. interlinear) examples into LibreOffice, according to the instructions. Let me know if you have any other questions.

-Jim K

On Wed, Mar 20, 2019 at 4:46 AM Matěj Cepl notifications@github.com wrote:

Hi,

I am thinking about moving my wife (PhD in linguistics, oriented more towards syntax/semantics than morphology/phon{etics,onology}) from LyX/LaTeX towards LibreOffice and the obvious problem is missing support for glosses and syntax trees. Is your package something which could work as an replacement for glosses as produced by covington http://ftp.cvut.cz/tex-archive/macros/latex/contrib/covington/covington.pdf package (pages 7 and following)?

[image: gloss] https://user-images.githubusercontent.com/198999/54674776-626cf580-4afd-11e9-816f-071862a0effc.png

If yes, could you please add some example how to do it to the manual? Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/silnrsi/libreoffice-linguistic-tools/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/ANaNoGVIaVqlXRMjuUSwVxyJAxZR1MvKks5vYgOFgaJpZM4b-4D2 .

mcepl commented 5 years ago

Enter data in SIL FieldWorks (or similar software such as Toolbox)

That's the point. Is there a way how to do it without a third-party program? With covington, I can write just

\digloss{Dit is een Nederlands voorbeeld}
        {This is a Dutch example}
        {This is an example in Dutch.}

in my document, and it is converted to the gloss shown above. Is the input format described somewhere, so I can at least write it myself in vi, or something?

jkornelsen commented 5 years ago

No, you must interlinearize it using a tool such as FieldWorks. How does covington know how to parse each morpheme, or is it only word-by-word glossing?

Theoretically, the data could be entered in XML using a text editor, but it would be MUCH more complex than your \digloss example. You would need to specify in the XML how each morpheme should be divided and analyzed. If you want to try it, use FieldWorks to create an example XML file and then look at the file to see what is required.

Beyond this, I'm not sure I can be of more help as I do not really understand what covington does. Perhaps you could try using FieldWorks and LingTools to better be able to find the answer to your question.

-Jims.

On Thu, Mar 21, 2019 at 8:21 AM Matěj Cepl notifications@github.com wrote:

Enter data in SIL FieldWorks (or similar software such as Toolbox)

That's the point. Is there a way how to do it without a third-party program. With covington, I can write just

\digloss{Dit is een Nederlands voorbeeld} {This is a Dutch example} {This is an example in Dutch.}

in my document, and it is converted to the gloss shown above. Is the input format described somewhere, so I can at least write it myself in vi, or something?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/silnrsi/libreoffice-linguistic-tools/issues/15#issuecomment-475225876, or mute the thread https://github.com/notifications/unsubscribe-auth/ANaNoALyk4PhfLGqZvPtBv0mJ_Drxo9fks5vY4degaJpZM4b-4D2 .

jkornelsen commented 5 years ago

Attached is an example XML file from FieldWorks containing three typical sentences in a language related to Tamil, and another couple of simple examples using the Toolbox format.

Be sure to view them in Unicode (I use the following command in vim: set enc=utf8).

-Jim K

On Thu, Mar 21, 2019 at 2:52 PM Jim Kornelsen jk.kornelsen@gmail.com wrote:

No, you must interlinearize it using a tool such as FieldWorks. How does covington know how to parse each morpheme, or is it only word-by-word glossing?

Theoretically, the data could be entered in XML using a text editor, but it would be MUCH more complex than your \digloss example. You would need to specify in the XML how each morpheme should be divided and analyzed. If you want to try it, use FieldWorks to create an example XML file and then look at the file to see what is required.

Beyond this, I'm not sure I can be of more help as I do not really understand what covington does. Perhaps you could try using FieldWorks and LingTools to better be able to find the answer to your question.

-Jims.

On Thu, Mar 21, 2019 at 8:21 AM Matěj Cepl notifications@github.com wrote:

Enter data in SIL FieldWorks (or similar software such as Toolbox)

That's the point. Is there a way how to do it without a third-party program. With covington, I can write just

\digloss{Dit is een Nederlands voorbeeld} {This is a Dutch example} {This is an example in Dutch.}

in my document, and it is converted to the gloss shown above. Is the input format described somewhere, so I can at least write it myself in vi, or something?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/silnrsi/libreoffice-linguistic-tools/issues/15#issuecomment-475225876, or mute the thread https://github.com/notifications/unsubscribe-auth/ANaNoALyk4PhfLGqZvPtBv0mJ_Drxo9fks5vY4degaJpZM4b-4D2 .

mcepl commented 5 years ago

No, you must interlinearize it using a tool such as FieldWorks. How does covington know how to parse each morpheme, or is it only word-by-word glossing?

Yes, basically. You can turn to the page 8 of the linked PDF to see more complex examples (curly brackets is a common TeX way how to mark things as belonging together).

Theoretically, the data could be entered in XML using a text editor, but it would be MUCH more complex than your \digloss example. You would need to specify in the XML how each morpheme should be divided and analyzed. If you want to try it, use FieldWorks to create an example XML file and then look at the file to see what is required.

OK, I can see at https://vimeo.com/channels/fieldworks that it is a way more complex than my simple need. Let's see how it works (I will probably have to install a virtual machine to get Ubuntu, they don't seem to have openSUSE compatible packages, oh well).

Perhaps you could try using FieldWorks and LingTools to better be able to find the answer to your question.

I will certainly do.

Unfortunately, your attachment didn't make it through email. You have to apparently attach it here in the web form. Thank you.

Best,

Matěj

jkornelsen commented 5 years ago

The files I referred to are in this repo: https://github.com/silnrsi/libreoffice-linguistic-tools/tree/master/LinguisticTools/tests/datafiles.

Flex (typical): FWtextPigFox.xml Toolbox (simple): TbxIntJPDN60.xml

jkornelsen commented 5 years ago

One more idea: If you know some programming, you could write a script in python, perl, tk or similar to convert files formatted for convington into either the xml format exported by FieldWorks (aka FLEXTEXT) or that of Toolbox. That would remove the requirement to install additional software. I would not recommend this for complex data, but it sounds like your data may be simple enough to manage in this way. I may even be able to help you write such a script, as it would perhaps only take me a couple of hours. Let me know if that sounds worth the effort.

mcepl commented 5 years ago

If you know some programming,

I am a lead maintainer of Python-related packages for SUSE, so I yes I know some programming in Python. ;)

The XMLs look really crazily complicated. I will take a look at it eventually.

Thank you

jkornelsen commented 5 years ago

That makes sense now - you are the programmer and she is the linguist. I myself am a hybrid linguist / programmer. So you're probably better at programming than I am, and she's probably better at linguistics. But hey, I try my best. :)

Anyway, I put together prototype working code using the example you gave. It's posted at https://github.com/silnrsi/libreoffice-linguistic-tools/blob/dev_extra/dev/covington/cov.py.

The resulting XML file can be imported using the LingTools add-on in LibreOffice as follows:

  1. Linguistics -> Interlinear Settings. Add the file and deselect Morpheme Line 1 (because we're only interested in syntax, not morphology).
  2. Linguistics -> Get Interlinear Examples.

Hopefully, it shouldn't require too much more effort to modify the code so that it does what you need.