Currently, the indexer does not use the backend interface, but is directly handed the ISO/TEI files going into the index. I don't see why this has to be so, although I know that AGD indexing is based, not on the transcripts linked to the backend, but to a transformed / enriched version of them.
Running the indexer via the backend would be more consistent and transparent (for other users, for documentation), It will certainly do no harm to have an additional indexer which uses methods from BackendInterfaceto iterate over transcripts. The requirement that ISO/TEI transcripts will have to be pre-processed before being handed to the indexer could be handled via an abstract method:
public abstract Transcript preProcess(Transcript transcriptFromCorpus);
There may be performance issues, but they will be much less pronounced for anything smaller than FOLK or ZW (i.e. for almost all corpora). For the challenging cases, the current indexer would still be there (but maybe in the specific application, not in the "general API"?).
Currently, the indexer does not use the backend interface, but is directly handed the ISO/TEI files going into the index. I don't see why this has to be so, although I know that AGD indexing is based, not on the transcripts linked to the backend, but to a transformed / enriched version of them. Running the indexer via the backend would be more consistent and transparent (for other users, for documentation), It will certainly do no harm to have an additional indexer which uses methods from
BackendInterface
to iterate over transcripts. The requirement that ISO/TEI transcripts will have to be pre-processed before being handed to the indexer could be handled via an abstract method:public abstract Transcript preProcess(Transcript transcriptFromCorpus);
There may be performance issues, but they will be much less pronounced for anything smaller than FOLK or ZW (i.e. for almost all corpora). For the challenging cases, the current indexer would still be there (but maybe in the specific application, not in the "general API"?).Wondering what @EleFri thinks :-)