This is my attempt to create a machine translator from and to multiple natural languages. The method is based on formal (Chomsky) grammars and equivalency rules. The input sentence is parsed into a tree, an equivalent tree is constructed in the target language and traversed to yield the translated text. Currently, I am working on my native Serbian, and English. Future support may include Russian, Spanish and German.
When generating random sentences in Serbian it feels some words come way too often. For example diminutives and augmentatives in Serbian. Create a way to decrease frequency of these words in random sentences.
Suggested implementation:
Create a map attr->int which means e.g. deminutive->2 means that a deminutive word will be discarded first two times it's randomly choosen and another word is randomly generated then. The higher the number is the unlikelier is such word will be choosen. The map for Serbian could be:
deminutiv->2
augmentativ->3
When generating random sentences in Serbian it feels some words come way too often. For example diminutives and augmentatives in Serbian. Create a way to decrease frequency of these words in random sentences.
Suggested implementation: Create a map attr->int which means e.g. deminutive->2 means that a deminutive word will be discarded first two times it's randomly choosen and another word is randomly generated then. The higher the number is the unlikelier is such word will be choosen. The map for Serbian could be: deminutiv->2 augmentativ->3