sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Add AQUA as word alignment engine #495

Open johnml1135 opened 1 month ago

johnml1135 commented 1 month ago

This is to add the AQUA missing words assessment to Serval.

Implementation option 1 (fully modular):

Implementation option 2 (start combining):

johnml1135 commented 1 month ago

Do the following to the existing files:

@ddaspit, what do you think?

johnml1135 commented 1 month ago

Ok - we will abandon the assessment API for right now and make a word alignment API.

johnml1135 commented 1 month ago

@ddaspit, what do you think - the basic refactoring would be:

Keep all API and database things the same. This is refactoring with 0 other changes. Just get ready for WordAlignment, don't add it yet.

johnml1135 commented 1 month ago

@ddaspit - How should we represent word alignments at the Serval API layer? Here is the interface that I am assuming:

Options:

  1. Just take word pairs and a score -> John:Juan:0.89
  2. Just take number pairs and a score -> 7:8:0.89
  3. Add both by using a "|" -> 7|John:8|Juan:0.89
  4. Use json to add the tokenization:
    {
    source_tokenization: ["His", "name", "is", "John"],
    target_tokenization: ... ,
    alignment: 1:1:0.89, 2:2:0.7546
    }
ddaspit commented 1 month ago

You should take a look at the TranslationResult model for inspiration. We will probably want a subset of the properties in that model, specifically SourceTokens, TargetTokens, and Alignment.