neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

Issue with conlleval #40

Closed monilouise closed 2 years ago

monilouise commented 2 years ago

I'm trying to reproduce the results with connleval script, by using MobaXterm on a Windows 10 environment. When I run the command ./conlleval.txt < portuguese-bert/ner_evaluation/output_bert_selective/predictions_conll.txt, the following error occurs:

unexpected number of features: 1 (3)

Even setting -d "\t" generates error (conlleval: unexpected number of features in line W B-PESSOA B-PESSOA)

What should I do?

Thanks in advance.

fabiocapsouza commented 2 years ago

Hi @monilouise, I'm not a Windows user so I can't help you with it. I suggest trying to run conlleval on Google Colab.

monilouise commented 2 years ago

Hi @fabiocapsouza , the same problem occurs on Google Colab:

unexpected number of features: 1 (3)

fabiocapsouza commented 2 years ago

Can you share the predictions_conll.txt file please?

monilouise commented 2 years ago

predictions_conll.txt

I'm sending the predictions file for selective, no CRF (fine tuning).

fabiocapsouza commented 2 years ago

It seems to be a problem of lines ending with \r\n instead of \n. I simply read and rewrote the file and it worked as expected: predictions_conll_2.txt

Original file:

$ xxd  -b -l 50 predictions_conll.txt
00000000: 01010111 00100000 01000010 00101101 01010000 01000101  W B-PE
00000006: 01010011 01010011 01001111 01000001 00100000 01000010  SSOA B
0000000c: 00101101 01010000 01000101 01010011 01010011 01001111  -PESSO
00000012: 01000001 00001101 00001010 00101110 00100000 01001001  A... I        <--- 00001101 00001010 = \r\n
00000018: 00101101 01010000 01000101 01010011 01010011 01001111  -PESSO
0000001e: 01000001 00100000 01001001 00101101 01010000 01000101  A I-PE
00000024: 01010011 01010011 01001111 01000001 00001101 00001010  SSOA..
0000002a: 01001010 01000001 01001101 01000101 01010011 00100000  JAMES 
00000030: 01001001 00101101                                      I-

Working file:

 $ xxd  -b -l 50 predictions_conll_2.txt
00000000: 01010111 00100000 01000010 00101101 01010000 01000101  W B-PE
00000006: 01010011 01010011 01001111 01000001 00100000 01000010  SSOA B
0000000c: 00101101 01010000 01000101 01010011 01010011 01001111  -PESSO
00000012: 01000001 00001010 00101110 00100000 01001001 00101101  A.. I-
00000018: 01010000 01000101 01010011 01010011 01001111 01000001  PESSOA
0000001e: 00100000 01001001 00101101 01010000 01000101 01010011   I-PES
00000024: 01010011 01001111 01000001 00001010 01001010 01000001  SOA.JA
0000002a: 01001101 01000101 01010011 00100000 01001001 00101101  MES I-
00000030: 01010000 01000101                                      PE
monilouise commented 2 years ago

Hi, it works for me too! Thanks!