openaire / iis

Information Inference Service of the OpenAIRE system
Apache License 2.0
20 stars 11 forks source link

Find a replacement for an obsolete ObjectStore based content filtering in cache builder workflow #1432

Open marekhorst opened 11 months ago

marekhorst commented 11 months ago

Cache builder's main purpose is to allow running metadata extraction, out of the regular provisioning cycle, on a predefined set of contents. Since PDF aggregation system replaced ObjectStore as a centralized content storage we cannot rely on an ObjectStoreId as a filtering feature anymore and need to find a replacement in order to be able to narrow down the set of contents to be processed by metadata extraction module (CERMINE in particular).

One viable alternative is checking the identifier prefix.