microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.5k stars 363 forks source link

[Code Completion - Token level] About eval_acc function #169

Closed St3p99 closed 1 year ago

St3p99 commented 1 year ago

While I understand the purpose of the _evalacc function and its overall structure, I have a question regarding a specific part of the code.

My question is about the use of pred[i-1] here, as opposed to pred[i]. Could someone provide more insight into why pred[i-1] is used in this context?

I appreciate any clarification or additional information that could help me better understand this particular aspect of the eval_acc function.


# ... (other code above)
            for i, y in enumerate(gt):
                if i == 0:
                    if y in [tokenizer.bos_token_id, tokenizer.eos_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id]:
                        now_gt = [y]
                        now_pred = [0] if prev_pred is None else [prev_pred]
                        all_pred.append(DecodeIds(now_pred).strip().split()[0])
                        all_gt.append(DecodeIds(now_gt).strip())
                        now_gt = []
                        now_pred = []
                    else:
                        now_gt = [y]
                        now_pred = [0] if prev_pred is None else [prev_pred]
                else:
                    if tokenizer.convert_ids_to_tokens(y)[0] == '\u0120':
                        if len(now_gt) > 0:
                            try:
                                all_pred.append(DecodeIds(now_pred).strip().split()[0])
                            except IndexError:
                                all_pred.append("<SPACE>")
                            all_gt.append(DecodeIds(now_gt).strip())
                            now_gt = []
                            now_pred = []
                    if y in [tokenizer.bos_token_id, tokenizer.eos_token_id, tokenizer.sep_token_id, tokenizer.pad_token_id] or tokenizer.convert_ids_to_tokens(y).startswith("<NUM_LIT"):
                        if len(now_gt) > 0:
                            try:
                                all_pred.append(DecodeIds(now_pred).strip().split()[0])
                            except IndexError:
                                all_pred.append("<SPACE>")
                            all_gt.append(DecodeIds(now_gt).strip())
                        now_gt = [y]
                        now_pred = [pred[i-1]]
                        try:
                            all_pred.append(DecodeIds(now_pred).strip().split()[0])
                        except IndexError:
                            all_pred.append("<SPACE>")
                        all_gt.append(DecodeIds(now_gt).strip())
                        now_gt = []
                        now_pred = []
                        continue
                    now_gt.append(y)
                    now_pred.append(pred[i-1])
# ... (other code below)