Closed LinguList closed 3 years ago
{ " ": { "symmetry": "", "dominant_hand": "", "nondominant_hand": "", "orientation": "", "location": "", "movement": "" } }
The first question is whether we should spell everything out when there is symmetry. So here we spell out dominant_hand and nondominant_hand, but in two-handed signs, we also have implicit values for each hand for each key, except symmetry.
So let's start easier without symmetry: { " ": { "symmetry": "", "dominant_hand": "", "nondominant_hand": "", "orientation": "", "location": "", "movement": "" } }
Is it better for empty values to be empty strings "", or null?
The first question is whether we should spell everything out when there is symmetry. So here we spell out dominant_hand and nondominant_hand, but in two-handed signs, we also have implicit values for each hand for each key, except symmetry.
Starting simple would mean not spelling everything out. That could become very complex because in some types of symmetry, orientation values are flipped, and all kinds of crazy stuff. So let's just leave it like this.
as mentioned in the email, I'll need test case examples, ideally in json format, where the key is a segmented hamnosys string, and the values then show how this should be parsed.
If you want an actual file to be saved somewhere, just say which directory and I can save the test files there. Otherwise I can post like this.
An actual file would be ideal: call it test_sign.py
and place it in the folder tests
, you can comment in python form on each test case. Let's go for empty strings now, not for null.
An actual file would be ideal: call it
test_sign.py
and place it in the foldertests
, you can comment in python form on each test case. Let's go for empty strings now, not for null.
Have a look at example (3) in test_sign.py: Now it looks like maybe it would be better to use the full possible structure for examples (1) and (2) as well, right?
Yes, I agree, it should be the same structure. Can you give me an idea of how I recognize that it switches between different hands? Is it with bracket symbols?
Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.
Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.
Always separated by the hamplus symbol: unicode E0E7
Yes, in brackets the first symbol (or string of symbols) notates dominant and second is nondominant.
Always separated by the hamplus symbol: unicode E0E7
So in fact that symbol will be very important
Have a look at the file test_handshape.py in the tests folder. I put together a regex for matching handshapes that I think works.
Maybe this is easier for you to see than the other descriptions, etc and may make it easier for you to design the parser. I've also worked up the figure below - changed slightly - just to show what each part is:
The annotated values are taken from the hamsymbols.tsv file I put together for domain, type, and possibly subdomain.
So this regex will work on data that is segmented or that has the orientation segment transcribed after handshape, which is the normal full transcription. BUT, it won't work in some cases if orientation is left out and the data is not segmented. This is because some of the diacritics are ambiguous between handshape and location, and could potentially occur adjacent to the handshape symbols. Then the regex will match the ambiguous symbols, when in fact they should belong to the location segment. Anyway, we'll have to think about a solution for that possibility if the matching occurs on unsegmented data.
as mentioned in the email, I'll need test case examples, ideally in json format, where the key is a segmented hamnosys string, and the values then show how this should be parsed.