sevilaybayatli / apertium-ambiguous

c7b70b2
GNU General Public License v3.0
1 stars 1 forks source link

Wiki documentation of apertium-ambiguous contains unclear sections #15

Closed dolphingarlic closed 5 years ago

dolphingarlic commented 5 years ago

There are a number of unclear sections in the documentation of apertium-ambiguous on the wiki. Here is a list of the ones that I have found so far:

sevilaybayatli commented 5 years ago

“This piece of code uses the segmenter to segment a corpus file and output the segmented sentences into a file. In kazSentenceTokenizer.rb, change the 2-letters code of the source language to the language desired. Here "kk" is code for Kazakh.”

Where do we put this file? Yor are going to install it in home directory.

Does this format work for other languages? this segmenter for Kazakh, and you can check segmenter site to see for which languages working.

When do we ever use this file? This program using to segments source language sentences, either use this program or use your own code for segment the sentences.

“For training, you should run these steps:”

Where did text.arpa come from? Where do we run these commands? For understanding what all of these you need read https://kheafield.com/code/kenlm/

What is the “subdirectory script”? subdirectory scripts inside apertium-ambiguous repository , you have to add our binary file that obtained inside scripts directory.

“Python scripts (exampleken1, kenlm.pyx, genalltra.py) used to score sentences can be found living here https://github.com/sevilaybayatli/apertium-ambiguous/tree/master/scripts. These scripts automatically do their functions.” Do we need to download these scripts? These scripts already inside apertium-ambiguous/scripts directory you dont need to download them.

How do they do their functions? that means you dont need download or write them, and their function is scoring Target sentences and normalizing them

Where do they go? in file CLExec.cpp there is path of them.

“The next step is downloading and compiling yasmet by doing the following:” Which directory do we download it into? You have to download it into home directory. Do we just copy the code into a new file, or do we need additional files? just copy file and then compile it.

“Change the language pair file name to the pair desired in the paths of apertium tools (biltrans, lextor, interchunk, postchunk, transfer) in the file CLExec.cpp”

Where in the code do we change this? in the file CLExec.cpp you need change this paths to your own language pair paths.

What is the language pair file? this file your language pair file like apertium-kaz-tur or apertium-eng-kaz..