tomaarsen / SpanMarkerNER

SpanMarker for Named Entity Recognition
https://tomaarsen.github.io/SpanMarkerNER/
Apache License 2.0
401 stars 28 forks source link

Warning update for evaluation done without document-level context #46

Closed jayant-yadav closed 11 months ago

jayant-yadav commented 11 months ago

Without document - level context, ie., in the absence of document_id and sentence_id should throw the warning that evaluation without these metadata will decrease the performance.

tomaarsen commented 11 months ago

Hello!

This warning is only thrown when document-level context is present during evaluation though, i.e. when:

So I think the old warning was correct? If the model was trained without document-level context and the evaluation is also without document-level context, then we don't need a warning I think. But please let me know if I'm overlooking something!

jayant-yadav commented 11 months ago

I guess so. What confused me were the line 243 and 255. One says performance would decrease if document-level context was provided and other says the opposite. My understanding was that the absence of document-level context will decrease the performance in any case. But maybe its not like that. Please close this issue if you think that's the case.

Thank you for the quick response though! I got interested in your Master Thesis work since mine was along the same lines but with Biomedical/Clinical data and so I could not release the trained models in open domains.

tomaarsen commented 11 months ago

I guess so. What confused me were the line 243 and 255.

I see! That's an easy mistake. The two warnings are to warn about the two cases where the model sees different type of data during training as it sees during evaluation, e.g.: Document-level context during evaluation No document-level context during evaluation
Document-level context during training All good! No warnings Warning from 255!
No document-level context during training Warning from 243! All good! No warnings

And I'm glad I'm not the only one who fancies NER enough to write a thesis about it! It's a fascinating task in my opinion. And biomedical/clinical data uses are very important! I trained a few example models on public data here, in case you're curious.

jayant-yadav commented 11 months ago

@tomaarsen Thank you for the clarification. I will close this issue since this is not a valid one.