lukyjanek / phonetic-transcription

Rule-based approach to the phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).
GNU General Public License v3.0
7 stars 6 forks source link

Automatic phonetic transcription of the Czech, Slovak and Polish languages

This repository contains codes of rule-based approach to the phonetics transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA). Used rules and IPA signs are based on the phonologic, phonetic, and orthoepic studies (listed below) of the mentioned West-Slavic languages.

Older versions and their evaluations can be found in GitHub releases. CHANGELOG.txt contains list of changes in each version. Current (still open) version is this one (version 2).

Usage

These scripts can be used both as imported in any project, and as shell scripts. Bellow, three examples how to use them are shown.

1. Import as the function to your Python3 project.

from phon_czech import ipa_czech
from phon_slovak import ipa_slovak
from phon_polish import ipa_polish

word1 = ipa_czech('všichni')
text1 = ipa_czech('Všichni lidé rodí se svobodní a sobě rovní co do důstojnosti a práv.')

word2 = ipa_slovak('všetci')
text2 = ipa_slovak('Všetci ľudia sa rodia slobodní a rovní si do dôstojnosti a práv.')

word3 = ipa_polish('wszyscy')
text3 = ipa_polish('Wszyscy ludzie rodzą się wolni i równi pod względem godności i praw.')

print(word1, word2, word3, sep='\n')
print(text1, text2, text3, sep='\n')

2. Read from stdin in the shell pipeline.

echo -e 'všichni' | python3 phon_czech.py
echo -e 'Všichni lidé rodí se svobodní a sobě rovní co do důstojnosti a práv.' | python3 phon_czech.py

echo -e 'všetci' | python3 phon_slovak.py
echo -e 'Všetci ľudia sa rodia slobodní a rovní si do dôstojnosti a práv.' | python3 phon_slovak.py

echo -e 'wszyscy' | python3 phon_polish.py
echo -e 'Wszyscy ludzie rodzą się wolni i równi pod względem godności i praw.' | python3 phon_polish.py
cat 'path-to-input-file' | python3 phon_czech.py
cat 'path-to-input-file' | python3 phon_slovak.py
cat 'path-to-input-file' | python3 phon_polish.py

3. Read from file in shell pipeline.

python3 phon_czech.py 'path-to-input-file'
python3 phon_slovak.py 'path-to-input-file'
python3 phon_polish.py 'path-to-input-file'

Based on these studies