timmahrt / pyJuliusAlign

One-button-press forced aligner for Japanese, using Julius.
Other
44 stars 10 forks source link

Unexpected cabocha error while running the example. #12

Closed DefaultCyberid closed 1 year ago

DefaultCyberid commented 1 year ago

Hi Tim, I'm trying to find a forced aligner for Japanese.

I followed the readme and run the align_example.py, the following log and error showed up.

STEP 1: Generating transcripts

STEP 2: Converting all text to kana
こんにちは
Traceback (most recent call last):
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyjuliusalign\jProcessingSnippet.py", line 165, in jReads
    sentence = etree.fromstring(xmlStr)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2288.0_x64__qbz5n2kfra8p0\lib\xml\etree\ElementTree.py", line 1343, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\pyJuliusAlign\examples\align_example.py", line 49, in <module>
    alignFromTextgrid.convertCorpusToKanaAndRomaji(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyjuliusalign\alignFromTextgrid.py", line 305, in convertCorpusToKanaAndRomaji
    dataPrepTuple = juliusAlignment.formatTextForJulius(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyjuliusalign\juliusAlignment.py", line 364, in formatTextForJulius
    (tmpWordList, tmpKanaList, tmpRomajiList) = jProcessingSnippet.getChunkedKana(
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyjuliusalign\jProcessingSnippet.py", line 233, in getChunkedKana
    kanaList, wordList = jReads(string, cabochaEncoding, cabochaPath)
  File "C:\Users\****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyjuliusalign\jProcessingSnippet.py", line 167, in jReads
    raise CabochaOutputError(xmlStr)
pyjuliusalign.jProcessingSnippet.CabochaOutputError: Unexpected error in cabocha output (possibly a problem with cabocha).  See error:
b''

I don't know how to explain the message b'', and the cabocha_output folder is empty. So I have no idea how to locate and solve the problem.

There was an error that the mecab's csv files are in the wrong encoding (should be utf-8 but in shiftjis), the message didn't change after I fixed it so that should be irrelevant.

I'm running with python 3.10, julius 4.6, cabocha 0.69, mecab 0.996, sox 14.4.2, windows 10 21H2.

timmahrt commented 1 year ago

It's been a while since I've run this software, particularly on windows.

I can investigate in detail later but in the short term, are you able to access all of the needed tools from the command line? eg C:\Program Files (x86)\CaboCha\bin\cabocha.exe (or whatever the path is) will correctly run cabocha?

If mecab was installed with shiftjis, that could also be the root of the problem--you said you fixed it but it could be worth looking into again if you're still having problems.

I will try to take a deeper look this week. Thanks!

DefaultCyberid commented 1 year ago

Thx for the help. I reinstalled the mecab and cabocha and check for all the possible wrong encoding stuff I know again, then the error message disappeared.

I got another weird problem in step 5, juliusAlignCabocha function, the loggerFd = open(logFn, "w") statement always throws an "OSError: [Errno 22] Invalid argument" error, the directory is there (checked by another line I added) and the "w" parameter should create the file. Maybe I'll solve it soon.

So I think this issue is solved already, it's my fault that not cautious enough when installing the required packages. Thx again for your help :-)

BTW: in step 4, the convertCorpusToKanaAndRomaji function prints "error" when it encounters an empty line in the file, but in my case the text file's content is

0.18147071657902195,0.6879082165790219,こんにちは

1.305966459679748,3.3287164596797476,私はティムマートです

3.934722740049523,5.338910240049523,私は学生です

6.047979040576486,7.344979040576486,日本語を勉強します

8.033552704870546,10.276177704870546,この夏は日本にいきました

10.844192385595038,13.735317385595039,東京行きましたも四国行きました

14.348019879774444,15.186082379774444,楽しかった

15.946147839441144,17.453960339441146,ラーメン大好きです

Should the function skip the loop instead of printing the "error" message?