Open maoo opened 10 years ago
Apache Manifold is planning on adding a Seeding Version String, that can be used to replace the JobID and therefore deprecate JobIdStealer - https://issues.apache.org/jira/browse/CONNECTORS-971
I can see the following method being added to BaseRepositoryConnector.java
@Override
public String addSeedDocumentsWithVersion(ISeedingActivity activities, Specification spec,
String lastSeedVersion, long seedTime, int jobMode)
throws ManifoldCFException, ServiceInterruption
I can use lastSeedVersion to store lastTxnId and lastACLChangesetId, divided by char '|' and deprecating JobIdStealer and CrawlLogger classes:
StringTokenizer tokenizer = new StringTokenizer(lastSeedVersion,"|");
long lastTransactionId = 0;
long lastAclChangesetId = 0;
if (tokenizer.countTokens() == 2) {
lastTransactionId = new Long(tokenizer.nextToken());
lastAclChangesetId = new Long(tokenizer.nextToken());
}
However, I don't know how to update the lastSeedVersion as soon as I collect the last docs processed; the old syntax is
crawlLogger.log(JobIdStealer.stealId(activities), lastTransactionId, lastAclChangesetId);
I was expecting to use a syntax like
super.setLastSeedVersion(lastTransactionId + '|' + lastAclChangesetId);
but maybe I'm misinterpreting this feature.
As confirmed by Manifold Committers, using
return lastTransactionId + "|" + lastAclChangesetId;
was enough to update the lastSeedVersion; tests are passing, now it needs some integration testing
Many thanks to @OpenPj and Karl Wright for the support on this issue!
As documented in the class itself....
This class is used to access the Job ID from the Alfresco Manifold Connector. We use the Job ID as identifier (primary key) of entries that we log (using CrawlLogger.java) into the connector to keep the state of the crawling (Last Transaction ID, Last ACL Changeset ID)
In order to solve this issue it's possible to: