openaire / iis

Information Inference Service of the OpenAIRE system
Apache License 2.0
20 stars 11 forks source link

fix UDF handling identifiers deduplication in direct citation matching #124

Closed marekhorst closed 8 years ago

marekhorst commented 8 years ago

As reported by @madryk there are some issues related to DeduplicateIdsWithDocumentType.java:

marekhorst commented 8 years ago

when two or more tuples are provided and none of them is typed as research-article null will be returned

Line 48 condition should be changed to firstTuple!=null.

input tuple provided by PIG contains 3 elements while UDF expects only 2

PIG script should not provide originalId which is not relevant in deduplication process.

Also java comment should be fixed: tuple[0] contains oaid identifier.