lingpy / pysign

Python library for the manipulation of sign language data
MIT License
0 stars 0 forks source link

test cases #1

Closed LinguList closed 3 years ago

LinguList commented 4 years ago

as mentioned in the email, I'll need test case examples, ideally in json format, where the key is a segmented hamnosys string, and the values then show how this should be parsed.

justopower commented 4 years ago

{ "    ": { "symmetry": "", "dominant_hand": "", "nondominant_hand": "", "orientation": "", "location": "", "movement": "" } }

justopower commented 4 years ago

The first question is whether we should spell everything out when there is symmetry. So here we spell out dominant_hand and nondominant_hand, but in two-handed signs, we also have implicit values for each hand for each key, except symmetry.

justopower commented 4 years ago

So let's start easier without symmetry: { "   ": { "symmetry": "", "dominant_hand": "", "nondominant_hand": "", "orientation": "", "location": "", "movement": "" } }

justopower commented 4 years ago

Is it better for empty values to be empty strings "", or null?

justopower commented 4 years ago

The first question is whether we should spell everything out when there is symmetry. So here we spell out dominant_hand and nondominant_hand, but in two-handed signs, we also have implicit values for each hand for each key, except symmetry.

Starting simple would mean not spelling everything out. That could become very complex because in some types of symmetry, orientation values are flipped, and all kinds of crazy stuff. So let's just leave it like this.

justopower commented 4 years ago

as mentioned in the email, I'll need test case examples, ideally in json format, where the key is a segmented hamnosys string, and the values then show how this should be parsed.

If you want an actual file to be saved somewhere, just say which directory and I can save the test files there. Otherwise I can post like this.

LinguList commented 4 years ago

An actual file would be ideal: call it test_sign.py and place it in the folder tests, you can comment in python form on each test case. Let's go for empty strings now, not for null.

justopower commented 4 years ago

An actual file would be ideal: call it test_sign.py and place it in the folder tests, you can comment in python form on each test case. Let's go for empty strings now, not for null.

Have a look at example (3) in test_sign.py: Now it looks like maybe it would be better to use the full possible structure for examples (1) and (2) as well, right?

LinguList commented 4 years ago

Yes, I agree, it should be the same structure. Can you give me an idea of how I recognize that it switches between different hands? Is it with bracket symbols?

justopower commented 4 years ago

Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.

justopower commented 4 years ago

Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.

Always separated by the hamplus symbol: unicode E0E7

justopower commented 4 years ago

Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.

Always separated by the hamplus symbol: unicode E0E7

So in fact that symbol will be very important

justopower commented 4 years ago

Have a look at the file test_handshape.py in the tests folder. I put together a regex for matching handshapes that I think works.

Maybe this is easier for you to see than the other descriptions, etc and may make it easier for you to design the parser. I've also worked up the figure below - changed slightly - just to show what each part is: image

The annotated values are taken from the hamsymbols.tsv file I put together for domain, type, and possibly subdomain.

  1. In the first set of brackets are the base handshape symbols; one and only one of these occurs in each handshape.
  2. Within the first capturing group, the first set of brackets has the thumb diacritics; zero or one of these can occur for each handshape.
  3. The second set of brackets has the flexion symbols. These can be re-used without an upper limit depending on how close the transcription is meant to be.
  4. Then come other diacritics that have no upper limit.

So this regex will work on data that is segmented or that has the orientation segment transcribed after handshape, which is the normal full transcription. BUT, it won't work in some cases if orientation is left out and the data is not segmented. This is because some of the diacritics are ambiguous between handshape and location, and could potentially occur adjacent to the handshape symbols. Then the regex will match the ambiguous symbols, when in fact they should belong to the location segment. Anyway, we'll have to think about a solution for that possibility if the matching occurs on unsegmented data.