wikilinks / conll03_nel_eval

Python evaluation scripts for AIDA-formatted CoNLL data
Apache License 2.0
20 stars 4 forks source link

ValueError: could not convert string to float #60

Closed userofgithub1 closed 6 years ago

userofgithub1 commented 6 years ago

Hey, When I perform the prepare step, I get this error:

  File "conll03_nel_eval/data.py", line 343, in read
    token, iob, name, link, score = self.dialect.extract_link(l)
  File "conll03_nel_eval/data.py", line 53, in extract_link
    score = float(line_bits[3])
ValueError: could not convert string to float: http://en.wikipedia.org/wiki/House_of_Commons

How can I resolve this error?

Any guidance will be much appreciated. Many thanks,

jnothman commented 6 years ago

Perhaps you put a link target where you should have had a score or something.

This software has not been maintained for years and is merely archived here

https://github.com/wikilinks/neleval replaced it

userofgithub1 commented 6 years ago

Thank you for your reply and for pointing out that other library, I thought it was only for TAC but it looks like it can evaluate Conll.

Thank you again,

userofgithub1 commented 6 years ago

@jnothman One more question about neleval. Do we have to use a wikipedia dump of 2011, the same year or the annotation of CONLL or is neleval evaluation is mapped to another version of wikipedia?

Also, could you please give an example of how to run neleval evaluation for conll dataset. All the examples run of TAC dataset golden spans.

Thank you,

jnothman commented 6 years ago

There's no constraint on which version of Wikipedia, as long as the IDs are consistent. For the CoNLL-AIDA dataset, your performance will be best if you map your output to be consistent with the gold standard, obviously.

Sorry I can't construct an example now, but:

$ ./nel prepare-conll-coref -h
usage: neleval prepare-conll-coref [-h] [--with-kb] [--cross-doc] [system]

Import format from CoNLL coreference for evaluation

positional arguments:
  system

optional arguments:
  -h, --help   show this help message and exit
  --with-kb    By default all cluster labels are treated as NILs. This flag
               treats all as KB IDs unless prefixed by "NIL"
  --cross-doc  By default, label space is independent per document. This flag
               assumes global label space.

This doesn't seem to include mapping and, in fact, I'm not sure whether we have that in the current version..

userofgithub1 commented 6 years ago

Thank you so much.

userofgithub1 commented 6 years ago

@jnothman I have some more questions about neleval if you allow me since most of its documentation is about TAC.

-Will prepare-conll-coref evaluate a systems output with CoNLL-AIDA gold standards?? or is it a specific measure that is different from the TAC measures? -What is meant by the argument[system] is it the output of the systems under evaluation? -Also, does --with-kb measure micro precision? or does it produce all the evaluation measures?

Sorry if my questions seem really shallow and show lack of knwoledge, I'm new to this topic and still trying my best to learn.

Thank you,

jnothman commented 6 years ago

Yes, system is a system output. It is in brackets only because it can be alternatively supplied on standard input rather than a command line argument.

with-kb will allow all evaluation measures. prepare-conll-coref does not perform the evaluation

userofgithub1 commented 6 years ago

@jnothman Thank you, this is truly appreciated.

jnothman commented 6 years ago

Hi @userofgithub1, I'm trying to get some better documentation up at https://neleval.readthedocs.io/

On 27 May 2018 at 22:32, userofgithub1 notifications@github.com wrote:

@jnothman https://github.com/jnothman Thank you, this is truly appreciated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wikilinks/conll03_nel_eval/issues/60#issuecomment-392327752, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz65CYFkySJcVxDnkGSfz1xs8k9W8jks5t2pzjgaJpZM4T4fwF .

userofgithub1 commented 6 years ago

Hey @jnothman, Thank you, these are useful docs. However, most examples are on TAC prepare and evaluate. Regarding CoNLL-AIDA, I have the output of testb from NEL framework in neleval format (7 columns) which looks like this:

1164testb RUGBY 1474 1491 en.wikipedia.org/wiki/Andrea_Castellani 1.0 PERSON

And I have the entire dataset file AIDA-YAGO2-dataset.tsv, which looks like this:

-DOCSTART- (1251testb Russia)
Russia  B   Russia  Russia  http://en.wikipedia.org/wiki/Russia /m/06bnz
warns
Norilsk B   Norilsk --NME--
,
not
expected
to
liquidate
it
.

So, when the following command is executed on the benchmark dataset no file is generated for later use with neleval 'evaluate' command:

$ neleval prepare-conll-coref\
  /path/to/AIDA-YAGO2-dataset.tsv\   #dataset file

How to convert CoNLL-AIDA gold standard to evaluation format?

Thank you so much,

jnothman commented 6 years ago

Sorry that I misunderstood the issue here... CoNLL03+AIDA != CoNLL 2011-2