nestauk / dap_aria_mapping

Mapping technology innovation to support The Advanced Research and Innovation Agency (ARIA)
MIT License
1 stars 0 forks source link

Patent deduplication #4

Closed georgerichardson closed 1 year ago

georgerichardson commented 1 year ago

Figure out the best strategy for deduplicating on patent family ID and keeping the earliest English abstract. Could happen during query or in post-processing.

india-kerle commented 1 year ago

de-duplicating is simple to do in the query and drastically reduces results - certainly the way to go.

india-kerle commented 1 year ago

Currently de-duplicating by picking an english language patent document per patent family