Closed kermitt2 closed 5 years ago
Thanks. Yes, the address should have been included. Sigh.
We could review all manually (but it's over 2,500 creator labels), but perhaps there is a reasonable way to look at the text immediately following a creator and see if the next few words include a geographic entity (ie match US state abbreviations or country names). Perhaps the geographic entity recognizer is already available?
In general, though, I think the creator label is the least important of all the labels, so perhaps we should prioritize the other labels.
So @kermitt2 does not include the address into creator
in the candidate release. May we make it a rule ("not include address into your creator
annotation") and close this issue now?
Yes. Decision was because addresses can be geolocated as separate entities, so no need to include them in creator. Or viewed another way they are metadata about the creator, not part of its name.
One consistency issue I have observed is the size of the chunk corresponding to creator. When creator is actual the software publisher, its address can be included or not in an unpredictable manner.
Examples:
PMC2927682
versus
PMC2921509: