A pipeline (or fanout) architecture for creating index annotations

~~If I understand correctly,~~

~~// TODO: make these pluggable, e.g. registered from an importer or something?~~

~~/pkg/index/corpus.go#L1173 is about allowing importers to register extractors for types they create (but maybe it's only about location info). Seems related.~~ Update: Nope, that comment is about allowing importers to indirect location info to an associated permanode (via the foursquareVenuePermanode attr in the case of camliType: "foursquare.com:checkin" nodes).

One thing to keep in mind is that there are two ways to add annotations: as fields in the indexer, which is done for a small number of core fields (location, time, etc), or as permanode attributes. You could implement an annotation pipeline using permanode attributes as a client, out of process. The main question is how you would keep track of which annotations came from which feature extractor, so that you could rebuild them when the feature extractor changes. The extractor name and version could be added as JSON fields on the attribute claims, so that you could round them up and delete them when you want to rebuild. This is what importers do currently, but at the level of permanodes rather than attribute claims.

perkeep / perkeep

A pipeline (or fanout) architecture for creating index annotations #734