Closed iR00i closed 1 year ago
Hello, @iR00i , we are very happy that you use AdaSeq.
For your 1st question:
For sequence labeling tasks, we just use seqeval to calculate the metrics: acc, p, r, mi-f1, ma-f1. Its evaluation method is the same as conlleval.
So when calculating accuracy
, O
is taken into account. However, when calculating F1
, O
is not.
For your 2nd question:
You can use basically ALL transformers from huggingface modelhub. Just copy the model_id to model.embedder.model_name_or_path
in the configuration file.
e.g.
model:
type: sequence-labeling-model
embedder:
model_name_or_path: dslim/bert-base-NER
dropout: 0.1
use_crf: true
Finally, some suggestions for you to get higher F1
:
We found that without further information, it's hard to train a stable model with the original MultiCoNER-II datasets as the context is short and the entity classes are fine-grained. You need to tune very very carefully.
You can try the retrieval-augmented dataset we provided, which can produce a much better result.
I have the similar question when simply using the code like 'python scripts/train.py -c examples/bert_crf/configs/conllpp.yaml ', but test result showed that F1 was really low while accuracy was high. I have tried a few yamls, it's the same problem. Why is this?really appreciate it, thanks!
I have the similar question when simply using the code like 'python scripts/train.py -c examples/bert_crf/configs/conllpp.yaml ', but test result showed that F1 was really low while accuracy was high. I have tried a few yamls, it's the same problem. Why is this?really appreciate it, thanks!
Hello, @t2413 .
I just ran the same training code on conllpp
with master branch, python=3.7
torch=1.12.1
modelscope=1.3.0
.
The results seemed reasonable:
test: {
"precision": 0.9450856942987058,
"recall": 0.947737635917222,
"f1": 0.9464098073555166,
"accuracy": 0.9869494993000969
}
I have no idea what the reason is for your bad performance. Maybe you can check if your requirements are installed correctly?
What is your question?
Hello, I am working on the SemEval2023 MultiCoNER-II task.
First of all thank you for sharing this amazing repo, its saving me a lot of time and effort.
With regards to the metrics, I was training an
xlm-robert-laarge
model on the English dataset and noticed the F1 score was low but accuracy was high (F1=0.38
andaccuracy = ~0.8
). If you have taken a look at the English dataset for the MultiCoNER-II you'll see that theOther
tag (aka'O'
) is more frequent in the dataset than any other tag by a large margin. Hence its possible that the model(s) may overfit and just start predicting the tagO
for most tokens in a sequence.My question is, When calculating the metrics, do you take into account the
O
tags into the calculation? in other words, do you mask the tokens in the target sequence whose gold tag isO
when calculating theloss
/accuracy
/F1
?My next question has to do with the possible configurations we can control. What are the models (transformers) that we can use?
What have you tried?
multiconer2-en-exp#1.zip The attached file contains the configuration I used. The model was early stopped at
epoch=7
Code (if necessary)
No response
What's your environment?
0.5.0
1.1.1
1.12.1+cu102
Linux-5.10.147+-x86_64-with-glibc2.27
in Google colab3.8.16
11.2
Tesla T4
15109MiB
Code of Conduct