rajgithub123 / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

Connector should not remove/delete documents from its queue unless checkPoint() is called #106

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a connector
2. Configure it to crawl some site collection
3. Check the way the documents that are sent to CM to be fed to GSA are 
removed from crawl queue

What is the expected output? 
Documents should be removed only on calls to checkPoint()

What do you see instead?
Documents are removed immediately when nextDocument() is called

Background for this:
Email thread:
On Sep 3, 2009, at 7:23 PM, John Lacey wrote:

    First, let me shelve thoughts about blowing up the SPI, and concentrate 
here on a small point of potential breakage in the current model.

    The connector manager assumes that it can redo a batch (following a 
transient RepositoryException, an OutOfMemoryError, or a PushException or 
FeedException) by not calling checkpoint, and by passing in the previous 
checkpoint to resumeTraversal. Until checkpoint is called, the connector 
should not assume that the CM has received any of the documents returned by 
nextDocument.

Brett's comments:
More accurately, "Until checkpoint is called, the connector should not 
assume that the *GSA* has received any of the documents returned by 
nextDocument."

John's comments:

    It is not a requirement that the value returned by checkpoint be unique 
to the batch, or that the contents of the DocumentLists returned by two 
consecutive calls to resumeTraversal, without an intervening call to 
checkpoint, be the same. However, none of the documents in the first batch 
should be considered fed or indexed, so they should appear somewhere in 
subsequent batches.

    I suppose this is obvious, but since we have been mucking around in 
this space in the connector manager extensively, and only deal extensively 
with connectors that return the full state in the checkpoint string, I 
wanted to make sure this was clear for all of the other connector 
developers.

Original issue reported on code.google.com by rakeshs101981@gmail.com on 7 Sep 2009 at 10:57

GoogleCodeExporter commented 9 years ago
Fix details: 
http://code.google.com/p/google-enterprise-connector-sharepoint/source/detail?r=
383

Original comment by rakeshs101981@gmail.com on 9 Oct 2009 at 11:28

GoogleCodeExporter commented 9 years ago

Original comment by rakeshs101981@gmail.com on 5 Nov 2009 at 9:01

GoogleCodeExporter commented 9 years ago
Verified in 2.4 Release

Original comment by ashwinip...@gmail.com on 14 Dec 2009 at 7:02