Open dongguanting opened 2 years ago
Hi @dongguanting, We don't have the copyright of Few-NERD dataset. Please contact the owner of this dataset. We already clear the Few-NERD version in our paper footnote 5. And we show all Few-NERD ACL and arxiv version results in our Github repo.
Thanks a lot for your reply. I still have a question during testing cross dataset senario. How to set up the script to execute the settings in your paper (2 datasets for training, 1 for valid, 1 for test), does this mean that it need to perform 2 rounds of training process with spans and types of 2 different ner_train.json?
Hi @dongguanting, not really, in the Cross-Domain dataset, you only need to train once on the training set (Span+Type) and then evaluate it directly. In the training phase, the model can see all task data of both domains.
In our scripts, you can set the dataset
to Domain and use the difference N
to set the domain.
N=1 # 1 or 2 or 3 or 4
K=1 # 1 or 5
...
--dataset Domain \
Maybe you wrongly reversed the results of the ACL version and arXiv version in this repo?(f1 of FEW-NERD arxiv version is higher,but in your repo,the ACL version result is higher) And I downloaded the arxiv version of episodes-data before (568MB, this link is already unavailable), the only version of episodes-date we can download on the FEW-NERD website (500 MB) is probably ACL version.
Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in https://github.com/thunlp/Few-NERD/commit/cb16dc48562f0017c74492a906f461a6947a4219#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD arixv v6 version result(500MB, using the URL link in https://github.com/thunlp/Few-NERD/commit/e32907982dac9956aaa603c28b57138b192fe6c0#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20), which also use in ESD.
Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.
Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?
Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.
Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?
Yeah, you can compare the results in the second table by using the 500MB episodes data.
@dongguanting I'm also trying the code but it asks me episode-data/inter/...
missing.
Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files.
Thanks
@dongguanting I'm also trying the code but it asks me
episode-data/inter/...
missing. Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files. Thanks
Hi @GenVr, you can download the arxiv v6 version Few-NERD dataset by follow the script in their repo in https://github.com/thunlp/Few-NERD/blob/main/data/download.sh#L20-L22.
Hi, @iofu728. It seems the open source dataset “episode-data” is the arxiv version of FewNERD? I found that the reproduced results are very different from those in the paper, maybe you use the ACL version of FewNERD in the paper?