rotem-shalev / Ham2Pose

Official implementation for "Ham2Pose: Animating Sign Language Notation into Pose Sequences" [CVPR 2023]
https://rotem-shalev.github.io/ham-to-pose
43 stars 5 forks source link

How to get the text description from HamNoSys? #2

Closed ZhengdiYu closed 1 year ago

ZhengdiYu commented 1 year ago

image

Hi,

I would like to know how did you extract the text description in the figure like: "Two flat hands with fingers closed, rotated towards each other, touching, then symmetrically moving diagonally downwards". Thank you very much!

Best

rotem-shalev commented 1 year ago

Hi Zhengdi, I did it manually, but you can get a "hint" for the meaning of each glyph by using this code from the hamnosys_tokenizer.py to extract the hamnosys text description from the font:

self.font_path = Path(__file__).parent.joinpath("HamNoSysUnicode.ttf")

with TTFont(self.font_path) as font:
    tokens = [chr(key) for key in font["cmap"].getBestCmap().keys()]

Or look at the "hamnosys_text" entry in the data.json where it exists.

In both it's the same description which is not exactly English, but you can understand the meaning of it in most cases. For example for the sign in the figure above, the text description is: "hamsymmlr, hamflathand, hamextfingerul, hampalmd, hamtouch, hammovedr", meaning: hands are symmetric (so the description from now on will be of the right hand, and the left hand would be a mirror of it), hand is flat open, fingers are pointing towards up and left, with palm down, hands are touching, than move the hand towards down and right (diagonally).

There is more information about each glyph in the HamNoSys documentation.

ZhengdiYu commented 1 year ago

Thanks for your reply! I have tried running hamnosys_tokenizer.py. However, the results are like:

{'tokens_ids': tensor([[  1,  13,  24,  31,  49,  72,  79, 183, 118, 176,   0,   0],
        [  1,  13,  24,  31,  49,  74,  79, 183, 118, 176, 182,   0],
        [  1,  11,  95,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  1,  11,  95,  28,  40,  56, 118, 176, 190,   2,  11,  95]]), 'attention_mask': tensor([[False, False, False, False, False, False, False, False, False, False,
          True,  True],
        [False, False, False, False, False, False, False, False, False, False,
         False,  True],
        [False, False, False,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [False, False, False, False, False, False, False, False, False, False,
         False, False]]), 'positions': tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  0,  0],
        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  0],
        [ 0,  1,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0],
        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])}

What is the meaning of them actually?

After that, I have printed font["cmap"].getBestCmap(). So I guess the token_ids represents the index in font["cmap"].getBestCmap()? For example, if the 'tokens_ids' is 0, then the actual hamnosys text is 32:'space', if 'tokens_ids' is 6, the actual hamnosys text will be 124: 'bar. Am I right?

{32: 'space', 33: 'exclam', 44: 'comma', 46: 'period', 63: 'question', 123: 'braceleft', 124: 'bar', 125: 'braceright', 160: 'space', 57344: 'hamfist', 57345: 'hamflathand', 57346: 'hamfinger2', 57347: 'hamfinger23', 57348: 'hamfinger23spread', 57349: 'hamfinger2345', 57350: 'hampinch12', 57351: 'hampinchall', 57352: 'hampinch12open', 57353: 'hamcee12', 57354: 'hamceeall', 57355: 'hamceeopen', 57356: 'hamthumboutmod', 57357: 'hamthumbacrossmod', 57358: 'hamthumbopenmod', 57360: 'hamfingerstraightmod', 57361: 'hamfingerbendmod', 57362: 'hamfingerhookmod', 57363: 'hamdoublebent', 57364: 'hamdoublehooked', 57376: 'hamextfingeru', 57377: 'hamextfingerur', 57378: 'hamextfingerr', 57379: 'hamextfingerdr', 57380: 'hamextfingerd', 57381: 'hamextfingerdl', 57382: 'hamextfingerl', 57383: 'hamextfingerul', 57384: 'hamextfingerol', 57385: 'hamextfingero', 57386: 'hamextfingeror', 57387: 'hamextfingeril', 57388: 'hamextfingeri', 57389: 'hamextfingerir', 57390: 'hamextfingerui', 57391: 'hamextfingerdi', 57392: 'hamextfingerdo', 57393: 'hamextfingeruo', 57400: 'hampalmu', 57401: 'hampalmur', 57402: 'hampalmr', 57403: 'hampalmdr', 57404: 'hampalmud', 57405: 'hampalmdl', 57406: 'hampalml', 57407: 'hampalmul', 57408: 'hamhead', 57409: 'hamheadtop', 57410: 'hamforehead', 57411: 'hameyebrows', 57412: 'hameyes', 57413: 'hamnose', 57414: 'hamnostrils', 57415: 'hamear', 57416: 'hamearlobe', 57417: 'hamcheek', 57418: 'hamlips', 57419: 'hamtongue', 57420: 'hamteeth', 57421: 'hamchin', 57422: 'hamunderchin', 57423: 'hamneck', 57424: 'hamshouldertop', 57425: 'hamshoulders', 57426: 'hamchest', 57427: 'hamstomach', 57428: 'hambelowstomach', 57432: 'hamlrbeside', 57433: 'hamlrat', 57434: 'hamcoreftag', 57435: 'hamcorefref', 57439: 'hamneutralspace', 57440: 'hamupperarm', 57441: 'hamelbow', 57442: 'hamelbowinside', 57443: 'hamlowerarm', 57444: 'hamwristback', 57445: 'hamwristpulse', 57446: 'hamthumbball', 57447: 'hampalm', 57448: 'hamhandback', 57449: 'hamthumbside', 57450: 'hampinkyside', 57456: 'hamthumb', 57457: 'hamindexfinger', 57458: 'hammiddlefinger', 57459: 'hamringfinger', 57460: 'hampinky', 57461: 'hamfingertip', 57462: 'hamfingernail', 57463: 'hamfingerpad', 57464: 'hamfingermidjoint', 57465: 'hamfingerbase', 57466: 'hamfingerside', 57468: 'hamwristtopulse', 57469: 'hamwristtoback', 57470: 'hamwristtothumb', 57471: 'hamwristtopinky', 57472: 'hammoveu', 57473: 'hammoveur', 57474: 'hammover', 57475: 'hammovedr', 57476: 'hammoved', 57477: 'hammoveudl', 57478: 'hammovel', 57479: 'hammoveul', 57480: 'hammoveol', 57481: 'hammoveo', 57482: 'hammoveor', 57483: 'hammoveil', 57484: 'hammovei', 57485: 'hammoveir', 57486: 'hammoveui', 57487: 'hammovedi', 57488: 'hammovedo', 57489: 'hammoveuo', 57490: 'hamcircleo', 57491: 'hamcirclei', 57492: 'hamcircled', 57493: 'hamcircleu', 57494: 'hamcirclel', 57495: 'hamcircler', 57496: 'hamcircleul', 57497: 'hamcircledr', 57498: 'hamcircleur', 57499: 'hamcircledl', 57500: 'hamcircleol', 57501: 'hamcircleir', 57502: 'hamcircleor', 57503: 'hamcircleil', 57504: 'hamcircleui', 57505: 'hamcircledo', 57506: 'hamcircleuo', 57507: 'hamcircledi', 57508: 'hamfingerplay', 57509: 'hamnodding', 57510: 'hamswinging', 57511: 'hamtwisting', 57512: 'hamstircw', 57513: 'hamstirccw', 57514: 'hamreplace', 57517: 'hammovecross', 57518: 'hammoveX', 57519: 'hamnomotion', 57520: 'hamclocku', 57521: 'hamclockul', 57522: 'hamclockl', 57523: 'hamclockdl', 57524: 'hamclockd', 57525: 'hamclockdr', 57526: 'hamclockr', 57527: 'hamclockur', 57528: 'hamclockfull', 57529: 'hamarcl', 57530: 'hamarcu', 57531: 'hamarcr', 57532: 'hamarcd', 57533: 'hamwavy', 57534: 'hamzigzag', 57536: 'hamellipseh', 57537: 'hamellipseur', 57538: 'hamellipsev', 57539: 'hamellipseul', 57540: 'hamincreasing', 57541: 'hamdecreasing', 57542: 'hamsmallmod', 57543: 'hamlargemod', 57544: 'hamfast', 57545: 'hamslow', 57546: 'hamtense', 57547: 'hamrest', 57548: 'hamhalt', 57552: 'hamclose', 57553: 'hamtouch', 57554: 'haminterlock', 57555: 'hamcross', 57556: 'hamarmextended', 57557: 'hambehind', 57558: 'hambrushing', 57560: 'hamrepeatfromstart', 57561: 'hamrepeatfromstartseveral', 57562: 'hamrepeatcontinue', 57563: 'hamrepeatcontinueseveral', 57564: 'hamrepeatreverse', 57565: 'hamalternatingmotion', 57568: 'hamseqbegin', 57569: 'hamseqend', 57570: 'hamparbegin', 57571: 'hamparend', 57572: 'hamfusionbegin', 57573: 'hamfusionend', 57574: 'hambetween', 57575: 'hamplus', 57576: 'hamsymmpar', 57577: 'hamsymmlr', 57578: 'hamnondominant', 57579: 'hamnonipsi', 57580: 'hametc', 57581: 'hamorirelative', 57584: 'hammime', 57585: 'hamversion40'}

rotem-shalev commented 1 year ago

Almost, note that the tokenizer has 2 special characters: self.pad_token_id = 0
self.bos_token_id = 1

"bos" stands for beginning of sentence/sequence, so as you can see each "tokens_ids" sequence starts with the bos_token_id which is 1. Moreover, we use padding in each batch so that all batch sequences have the same length. Meaning, if the current sequence length is 10 and the maximal sequence length in this batch is 20, then "tokens_ids" for this sequence will be of length 20 as the longest one, where the first token is the bos token id, the next 10 are the ids that correspond to each of the sequence tokens, and the last 10 will be of the padding token id which is 0, as we can see in the example you printed above.

Other than that, every glyph from the HamNoSys font gets its unique index starting from 2, so in your example above, the token id for 32:'space' will be 2, and so on. BTW you can also use self.i2s of the tokenizer to get the relevant char for each id, so for example self.i2s[2] will return ' ' (space).

As for the other fields:

ZhengdiYu commented 1 year ago

exclam

Thanks for the clarification. So in short: 0 is used for padding and 1 is used as a start flag. So the actual first index starts from 2, which means that 124: 'bar' is not corresponding to 6 as I thought before, instead, it should be 8. Is this correct?

And the actual first hamnosys '13' of the first example : tensor([[ 1, 13, 24, 31, 49, 72, 79, 183, 118, 176, 0, 0]", is actually the 11^th element in font["cmap"].getBestCmap(), which is 57346: 'hamfinger2'"

In the end, I wrote:

with TTFont(self.font_path) as font:
            tokens = [chr(key) for key in font["cmap"].getBestCmap().keys()]
            self.keys = [key for key in font["cmap"].getBestCmap().keys()]
            self.maps = font["cmap"].getBestCmap()

hamnosys_text_list = []
for token_id in tokens_batch["tokens_ids"]:
      # print(np.where(token_id!=0), token_id)
    tmp_token_id = token_id[np.where(token_id!=0)][1:] - 2 # drop all the 0 and the start flag 1 and minus to to transfer id to index
    single_hamnosys_text = ''
    for single_id in tmp_token_id:
        single_hamnosys_text += self.maps[self.keys[int(single_id)]] + ','
    hamnosys_text_list.append(single_hamnosys_text[:-1]) # drop last ','

return hamnosys_text_list

But actually, if I only want to translate glyph into text, it seems easier to directly save a dict for "hamnosys:hamnosys_text", because the tokens by iterating on tokens:

   with TTFont(self.font_path) as font:
            tokens = [chr(key) for key in font["cmap"].getBestCmap().keys()]
            self.keys = [key for key in font["cmap"].getBestCmap().keys()]
            self.maps = font["cmap"].getBestCmap()

        print('maps:', self.maps)
        print('tokens: ', tokens)

        self.corr = {}
        for i, x in enumerate(tokens):
            self.corr[x] = self.maps[self.keys[i]]

    def __call__(self, texts: List[str], device=None):
        hamnosys_text_list = []
        for text in texts:
            hamnosys_text = ''
            for i in text:
                hamnosys_text += self.corr[i] +','
            hamnosys_text_list.append(hamnosys_text[:-1])
        return hamnosys_text_list
rotem-shalev commented 1 year ago

Yes, that's correct.

ZhengdiYu commented 1 year ago

Yes, that's correct.

Great! Thank you Rotem.