openaire / iis

Information Inference Service of the OpenAIRE system
Apache License 2.0
20 stars 11 forks source link

Closes #1234: Remove project concept matching phase #1448

Closed marekhorst closed 6 months ago

marekhorst commented 6 months ago

Supplementary pull request related description:

Two integration tests were run (primary/processing and export/actionmanager/sequencefile) proving the introduced changes did not break those workflows.

The original description from the commit message:

This commit removes support for matching concepts (listed in import_project_concepts_context_ids_csv IIS input parameter) with publications linked to projects associated with those concepts.

The reason for the removal is this code is now overlapping with the community bulk tagging functionality implemented in a different module being a part of the provisioning chain.

The result of the removal is manifested by the lack of the Oaf publication records encapsulating the concepts (defined as contexts) associated with publications in the output actionset identified with the action_set_id_document_referencedProjects input parameter. Relevant report metrics (processing.referenceExtraction.concept.*) are also removed.

It is enough to list communities and research initiatives only among the concepts listed as import_project_concepts_context_ids_csv input parameter from now on. The listed concepts will be streamed to community TDM (text data mining) and research initiatives TDM algorithms only. No matching against publications linked to projects will be performed anymore.

mpol commented 6 months ago

Shouldn't it be closing #1439?

marekhorst commented 6 months ago

Shouldn't it be closing #1439?

@mpol You're right, just pushed the updated commit (forced push).

marekhorst commented 6 months ago

Thanks. I just renamed this test to testDatasetReferenceExtractionOutputTransformerWorkflow() in order to avoid confusion.

marekhorst commented 6 months ago

Merged with this commit after applying last minute changes: https://github.com/openaire/iis/commit/d919bae087d692d0446dfce44f672595ed5d8a63