Wiki documentation of apertium-ambiguous contains unclear sections

There are a number of unclear sections in the documentation of apertium-ambiguous on the wiki. Here is a list of the ones that I have found so far:

“This piece of code uses the segmenter to segment a corpus file and output the segmented sentences into a file. In kazSentenceTokenizer.rb, change the 2-letters code of the source language to the language desired. Here "kk" is code for Kazakh.”
- Where do we put this file?
- Does this format work for other languages?
- When do we ever use this file?
“For training, you should run these steps:”
- Where did text.arpa come from?
- Where do we run these commands?
- What is the “subdirectory script”?
“Python scripts (exampleken1, kenlm.pyx, genalltra.py) used to score sentences can be found living here https://github.com/sevilaybayatli/apertium-ambiguous/tree/master/scripts. These scripts automatically do their functions.”
- Do we need to download these scripts?
- How do they do their functions?
- What are their functions
- Where do they go?
“The next step is downloading and compiling yasmet by doing the following:”
- Which directory do we download it into?
- Do we just copy the code into a new file, or do we need additional files?
“Change the language pair file name to the pair desired in the paths of apertium tools (biltrans, lextor, interchunk, postchunk, transfer) in the file CLExec.cpp”
- Where in the code do we change this?
- Where do we find this file?
- What is the language pair file?

“This piece of code uses the segmenter to segment a corpus file and output the segmented sentences into a file. In kazSentenceTokenizer.rb, change the 2-letters code of the source language to the language desired. Here "kk" is code for Kazakh.”

Where do we put this file? Yor are going to install it in home directory.

Does this format work for other languages? this segmenter for Kazakh, and you can check segmenter site to see for which languages working.

When do we ever use this file? This program using to segments source language sentences, either use this program or use your own code for segment the sentences.

“For training, you should run these steps:”

Where did text.arpa come from? Where do we run these commands? For understanding what all of these you need read https://kheafield.com/code/kenlm/

What is the “subdirectory script”? subdirectory scripts inside apertium-ambiguous repository , you have to add our binary file that obtained inside scripts directory.

“Python scripts (exampleken1, kenlm.pyx, genalltra.py) used to score sentences can be found living here https://github.com/sevilaybayatli/apertium-ambiguous/tree/master/scripts. These scripts automatically do their functions.” Do we need to download these scripts? These scripts already inside apertium-ambiguous/scripts directory you dont need to download them.

How do they do their functions? that means you dont need download or write them, and their function is scoring Target sentences and normalizing them

Where do they go? in file CLExec.cpp there is path of them.

“The next step is downloading and compiling yasmet by doing the following:” Which directory do we download it into? You have to download it into home directory. Do we just copy the code into a new file, or do we need additional files? just copy file and then compile it.

“Change the language pair file name to the pair desired in the paths of apertium tools (biltrans, lextor, interchunk, postchunk, transfer) in the file CLExec.cpp”

Where in the code do we change this? in the file CLExec.cpp you need change this paths to your own language pair paths.

What is the language pair file? this file your language pair file like apertium-kaz-tur or apertium-eng-kaz..

sevilaybayatli / apertium-ambiguous

Wiki documentation of apertium-ambiguous contains unclear sections #15