oracc / nammu

Oracc GUI
GNU General Public License v3.0
12 stars 10 forks source link

Non-lemmatising file #186

Closed EleanorRobson closed 7 years ago

EleanorRobson commented 8 years ago

bor_4_132.atf.txt

Here's the ATF file I'm working on at the moment. (I had to add a .txt extension in order to upload it.) I created most of it in Emacs, before Nammu existed, but this afternoon I added the final text at the bottom of the file. I've got several issues, which may or may not be related:

  1. The syntax checker is wrongly detecting an error in line 329 (both Emacs and the command-line checker on the Oracc server tell me that this is correct ATF).
  2. As I hope you can see on the screen-grab

syntax-report

the whole of the rest of the file is then reported in the black portal at the bottom of the screen.

  1. The file is very slow to edit (too big? or is the syntax-checker slowing everything down).
  2. I get no response to the lemmatiser command (but I can lemmatise the file in Emacs) — another speed issue?
raquelalegre commented 7 years ago

Hi Eleanor,

Thanks for the report!

  1. The validation is failing because of a bug in PyORACC. I'll need to investigate further and fix it. Something about the seal is not being parsed correctly. Validation as it is now first checks the text is valid with our parser, then sends it to the ORACC server. The message "There is a syntax error near character ..." is not coming from the ORACC server but Nammu, after calling the validation with PyORACC. I'll fix this issue in the parser and then we can discuss if we should keep using the PyORACC validation or not. Perhaps it's better for now just to use the one in the server not to get other users confused.
  2. The chunk of text pasted in the console is fue to PyORACC not analysing the grammar properly and breaking up the code in the wrong bits.
  3. The file being slow - I find it a bit odd, since the file is not that big. What makes things slow is certainly the syntax highlighter. Once that is fixed, speed would be improved, but I'm not working on that yet.
  4. The lemmatisation command is complaining in the server because of the txt extension. This is what I can see in the log: ox: file must end in .atf or .otf, or use -a or -o options (found '.txt') If this is not happening in Emacs is because they're somehow converting the file format to atf, or running the command in the server with "-a" or "-o". I think I'll need to ask @stinney, so that Nammu behaves the same way, unless you want to force users to use atf/off extensions.
EleanorRobson commented 7 years ago

Thanks for looking into this. I only added the .txt extension to the file in order to upload it to GitHub, as it wouldn't accept the .atf extension. I was working with it as .atf. (I tried to explain this in my first report, but obviously not clearly enough -- sorry!) Try removing the .txt extension and see what happens.

Most gratefully,

e

raquelalegre commented 7 years ago

Oh, I see, I thought you meant uploading it to the ORACC server and I got confused :)

I can now reproduce your problem. There is an error in the log indicating that the file is not correctly encoded as a zipped file, which is what's causing the error.

So there are two problems: the encoding is not working, and the error is not being reported to the user. I need to investigate further why the zip error is happening with this file specifically, and also change the error reporting to be propagated to the console, which is easier to fix.

raquelalegre commented 7 years ago

Update about seals from our meeting yesterday:

Considering a text like this:

@tablet
# [ ... Description of surfaces ... ]
@seal A
1.  {[na₄]}KIŠIB [{m}]mu#-ra-nu
#lem: kunuk[seal]N Muranu[1]PN +.

PyORACC parses seal translations correctly when the format is as follows:

@translation labeled en project
# [ ... Translation of other surfaces ... ]
@label seal A 1
Seal of Muranu

The example text has this form instead:

@translation labeled en project
# [ ... Translation of other surfaces ... ]
@seal A
@label 1
Seal of Muranu.

PyORACC validates the first form, but not the second. The ORACC server considers both valid. @EleanorRobson is asking the people working on the ATF documentation for the correct form of describing seals and we'll have to update PyORACC accordingly.

jnovotny-lmu commented 7 years ago

@raquel-ucl @EleanorRobson @label seal A 1 Seal of Muranu is, in my experience, the correct labeling in the translation. @(seal A 1) should also work. The labels in the translation need to include the surface + line number. For example, if a seal has more than one line you need seal A 1 - seal A 7 to validate; the closer also needs to have the surface and line number.

I have never used the format @seal A @label 1 Seal of Muranu. To me, this seems wrong and perhaps the Oracc validator should flag it as an error. @stinney: Should this be flagged as an error or is it also a valid format?

raquelalegre commented 7 years ago

See resolution here. PyORACC is OK as is.

raquelalegre commented 7 years ago

Closing since all fixed except the zip problem which is reported in #132.