metafacture / metafacture-documentation

The central place for documentation about metafacture
http://metafacture.github.io/metafacture-documentation/
Apache License 2.0
3 stars 3 forks source link

Metamorph: remove duplicates from array #6

Closed philboeselager closed 8 years ago

philboeselager commented 8 years ago

Is there already a way to remove duplicates from metamorph arrays?

Preferably, a "unique" attribute or option would exist within the "entity" tag. Otherwise, the implementation of an ArrayDuplicateRemover would be the way to go, I guess. This solution should be more expensive in terms of calculation duration.

Corresponding issue: https://github.com/hbz/lobid-organisations/issues/18 Morph code used so far: https://github.com/hbz/lobid-organisations/blob/c72429235ddd3fe713449e39b8456df921f32b6b/src/main/resources/morph-enriched.xml#L304 Morph output: http://beta.lobid.org/organisations/DE-9 The occurrence of this duplicate is caused by input data. Nevertheless, it is desirable to have a removal option.

dr0i commented 8 years ago

With the <occurrence only="1" /> directive (see https://github.com/hbz/lobid-organisations/commit/c40a5e70a972f6eae1bbf85dfebeaf30c5619126) it's possible to restrict the outcome exactly one entry.

dr0i commented 8 years ago

Closing.