Closed karlie38 closed 2 years ago
To better evaluate token level accuracy, we tokenize the original code.
Take package org.vaadin.teemu.clara.demo
as an example, if it isn't split by .
, org.vaadin.teemu.clara.demo
will be seen as a single token. The accuracy score of prediction A package org.vaadin.teemu.clara
is the same as prediction B package net
. But prediction A is obviously better than B.
I'd like to use /CodeXGLUE/Code-Code/CodeCompletion-token/dataset/javaCorpus/token_completion dataset, and have a following question.
When I download the dataset, it is already splited by ' . '. The original data might be "package org.vaadin.teemu.clara.demo". Could you explain it?