snaekobbi / requirements

Requirements-based functional testing of the Braille in DAISY Pipeline project
http://snaekobbi.github.io/requirements
0 stars 0 forks source link

Tests for Danish translation #3

Open bertfrees opened 9 years ago

bertfrees commented 9 years ago

Includes:

Addresses requirements:

Related issues:

bertfrees commented 9 years ago

@stesk You said you had already started looking into liblouis tests. You looked at how Jukka did it and you think you can do the same for Danish. That would be great! But note that if the test data is not in the JSON format that's perfectly fine too.

Jukka only did tests for uncontracted braille. 8-dot braille and contracted braille was not a requirement for Finnish. We also have test data for emphasis indicators (see http://snaekobbi.github.io/requirements/finnish#4.3:38). Note that the test is not written in the liblouis JSON format (because this feature is not implemented with liblouis alone), but the principle remains the same: input + expected output.

For Danish I think contracted braille is a requirement so we probably need some additional data for that. However implementing/testing contracted braille can be tricky because for a perfect coverage you may need a list of thousands of correctly translated words if the braille code is more dictionary based than rule based. We have several options:

  1. The braille code for Danish has very strict rules and doesn't require a dictionary based implementation which means we don't need a large amount of data.
  2. A dictionary based implementation is required but you already have all the data we need.
  3. You create the required data with the help of transcribers and/or by analyzing transcribed books. This can be a huge job.
  4. You work closely together with the current maintainer of the liblouis table for Danish (Bue Vester-Andersen) who I believe has gathered a lot of data throughout the years and maybe wants to share it with you.
  5. You don't care about test data, and you realize in this case we can't really guarantee the correctness of the implementation.

From what I know about Danish braille the rules for contracted braille are related to hyphenation. Bue's liblouis table is largely based on hyphenation data. So possibly this issue is closely related to issue #8 (Tests for Danish hyphenation).

It would be great if any official documentation you have about the Danish braille code could be made publicly available, so that it can be referenced from code, test data, etc. For example, these are documents from NLB about the Norwegian braille code: https://github.com/liblouis/liblouis/tree/formal_braille_spec/norwegian. You think we can have a similar page for Danish? (see issue https://github.com/snaekobbi/liblouis/issues/7)

In case you want to discuss anything I'm always on Skype and IRC (channel #snaekobbi).

stesk commented 9 years ago

@bertfrees I'm on it. For now I've contacted Bue regarding the state of the Danish translation tables in liblouis. I will then try to figure out, probably by asking more knowledgeable colleagues, how to collect representative test data.

bertfrees commented 9 years ago

@stesk will convert the examples in his documentation (https://github.com/stesk/danishbraille) to liblouis harness tests.

In addition, we already have a large amount of test data (dictionary tests) thanks to @BueVest.

Nota has enough confidence in this data.

bertfrees commented 9 years ago

There is currently no definition of 8-dot braille. Need to find out whether defining and/or implementing are in scope. If yes, what is the priority.

bertfrees commented 9 years ago

Correction: of course there is a definition of 8-dot braille. Bue even made an implementation in liblouis, see above.