Closed davda54 closed 3 years ago
Actually, I left the lines there so you could see the sentence ids of any missing sentences. If you happen to know a more elegant way to do it in the assert statement, we could add that in. Otherwise, I'd rather it print out the sentence ids and then trigger the assert.
Oh, I didn't realize that, sorry for an unnecessary PR about something that is really not an issue. I think this functionality can be realized by a one-liner, which should also save some computation when the assert is not triggered:
assert p == g, f"missing some sentences: {[i for i in g if i not in p]}"
However, this still doesn't cover the possibility of having more predictions than golds (can happen with a bug in cross-lingual training, perhaps). So since this PR is already here, I would suggest changing the code to:
g = set(gold.keys())
p = set(preds.keys())
assert g.issubset(p), f"missing some sentences: {g.difference(p)}"
assert p.issubset(g), f"predictions contain sentences that are not in golds: {p.difference(g)}"
That looks like a great solution. Thanks for that. I've added the code in and will close the pull request, since the actual request still has the old solution.
Seems like these lines remained from an older version of this file. The check is performed by the assert below.