usnistgov / trec_eval

Evaluation software used in the Text Retrieval Conference
232 stars 49 forks source link

can we permit comments in results and qrel files? #20

Closed cmacdonald closed 4 years ago

cmacdonald commented 4 years ago

please.

isoboroff commented 4 years ago

Yes. Line-oriented comments marked with a # at the start of the line to EOL is easy to implement, and while qrels files wouldn't be backwards compatible they can be made so with grep -v.

cmacdonald commented 4 years ago

for qrels, do you have qids starting with #?

isoboroff commented 4 years ago

Not in TREC. trec_eval didn't even support non-numeric qids before v8.

isoboroff commented 4 years ago

Maybe we should head comments with something really old and obscure like DNL, as a salute to our auguft heritage.

cmacdonald commented 4 years ago

why qrels files wouldn't be backwards compatible if we allowed # comments?

isoboroff commented 4 years ago

Craig, I included test cases but please let me know if you see any anomalies.

isoboroff commented 4 years ago

Missing documentation. Comments are lines starting with # and ending with a newline.

cmacdonald commented 4 years ago

The malformed line numbers are useful. Can I check they aren't upset when we skip comment lines?

isoboroff commented 4 years ago

Don't quite understand... test case? The code skips comment lines both when allocating and reading, so not sure which line numbers you mean.

isoboroff commented 4 years ago

Added documentation string in commit ce3a5df5002df4c5ef6092b8ced56c06942dac2a

cmacdonald commented 4 years ago

Thanks Ian. Documentation looks suffice.

Re line numbers. Consider this example malformed qrels file:

1 0 1

If I use that, I get the following error:

$ trec_eval9 badqrel oneres.res 
trec_eval.get_qrels: Malformed line 1

My question was, with your patch, are the malformed line numbers after comments still correct? i.e. the malformed line number for a qrel file below would still be line number 2?

#this is a comment
1 0 1

This line of the patch makes me think not, but the code below looks OK.