Closed bwang482 closed 4 years ago
Presumably there is an formatting error in your OOV.txt. If you'd like us to take a look please post it somewhere so we can replicate the issue.
On Mon, Aug 6, 2018 at 2:14 PM bluemonk482 notifications@github.com wrote:
I have used the suggested commands below for dealing with the OOV issue:
$ ./sort.py eng.dict OOV.txt > tmp; $ mv tmp eng.dict
However, I am getting the error below: Formatting error in dictionary '/Users/bowang/Tools/Prosodylab-Aligner/eng.dict' (ln. 1).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/72, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOZstZ-93jVkTHZn0J-qlnC6rnFH-ks5uOId2gaJpZM4Vw2jY .
Thanks @kylebgorman
Here is the Dropbox link to the OOV.txt and eng.dict:
https://www.dropbox.com/s/a7rro8is1tw774h/OOV.txt?dl=0 https://www.dropbox.com/s/yyut5jlfb4f3ev4/eng.dict?dl=0
The OOV file looks like you haven't actually tokenized or case-folded the data as it expects. You need to make sure you're removing punctuation marks and ignoring case in your lab files. For instance for the first sentence in this message you would want the label file to read:
THE OOV FILE LOOKS LIKE YOU HAVEN'T ACTUALLY TOKENIZED OR CASE FOLDED
THE DATA AS IT EXPECTS
On Mon, Aug 6, 2018 at 7:00 PM bluemonk482 notifications@github.com wrote:
Thanks @kylebgorman https://github.com/kylebgorman
Here is the Dropbox link to the OOV.txt and eng.dict:
https://www.dropbox.com/s/a7rro8is1tw774h/OOV.txt?dl=0 https://www.dropbox.com/s/yyut5jlfb4f3ev4/eng.dict?dl=0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/72#issuecomment-410879260, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOZik4LnpSCJvmaZWTlARfdM1Bjgtks5uOMqJgaJpZM4Vw2jY .
@kylebgorman Thanks! I have tokenized, upper-cased, and removed punctuations (except apostrophe). Now I do have a few OOV words including numbers (also even "IT'S" is a OOV). I have followed the steps provided again to add this number into the lang.dict.
It is giving the same formatting error for the lang.dict. The issue is the provided way of editing the lang.dict file somehow changes its format?
You need to ensure that the line you add has the same formatting as other lines. I'm not sure what else to tell you. I do this simply by typing in my preferred text editor, correcting any errors manually. Is the expected format clear?
If your text editor makes this hard to do, try another one.
On Sun, Aug 12, 2018, 4:51 PM bluemonk482 notifications@github.com wrote:
@kylebgorman https://github.com/kylebgorman Thanks! I have tokenized, upper-cased, and removed punctuations (except apostrophe). Now I have only one OOV token which is a 5 digits number. I have followed the steps provided again to add this number into the lang.dict.
It is giving the same formatting error for the lang.dict. The issue is the provided way of editing the lang.dict file somehow changes its format?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/72#issuecomment-412380208, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOWAKJKDn0H9MHwMp86zSdtlAoSLIks5uQL9vgaJpZM4Vw2jY .
@kylebgorman Thanks. Do you edit the lang.dict file by simply adding the OOV words on the top, and one OOV word per line?
I have tried the following:
1), the provided code for dealing with OOV words by editing the dict file, gives formatting error. 2), I have tried editing the dict file using texteditor and sublime (adding OOV words on top). Again I am getting formatting error.
I do appreciated your help @kylebgorman . But I believe I am editing the lang.dict file the wrong way here..
On Mon, Aug 13, 2018 at 9:34 AM bluemonk482 notifications@github.com wrote:
@kylebgorman https://github.com/kylebgorman Thanks. Do you edit the lang.dict file by simply adding the OOV words on the top, and one OOV word per line?
I have tried the following:
1), the provided code for dealing with OOV words by editing the dict file, gives formatting error. 2), I have tried editing the dict file using texteditor and sublime (adding OOV words on top). Again I am getting formatting error.
In both cases you have to make sure to also sort the dictionary as described in the README.
Closing for inactivity.
I have used the suggested commands below for dealing with the OOV issue:
However, I am getting the error below:
Formatting error in dictionary '/Users/bowang/Tools/Prosodylab-Aligner/eng.dict' (ln. 1).