Closed GoogleCodeExporter closed 9 years ago
Original comment by rakeshs101981@gmail.com
on 27 Aug 2009 at 5:14
Initial investigations suggest that the file size is returned as a metadata:
ows_FileSizeDisplay
The following link suggests it is safe to use:
http://blogs.msdn.com/karthick/archive/2006/04/07/570398.aspx
Need more investigation that it can be handled seamlessly
Original comment by rakeshs101981@gmail.com
on 3 Sep 2009 at 11:10
Will use the ows_FileSizeDisplay attribute for determining the file size. This
will
also require that SharePointTraversalManager implements
com.google.enterprise.connector.spi.TraversalContextAware
The main methods of interest are:
1. maxDocumentSize() --Should be used before opening a stream for the document
2. mimeTypeSupportLevel(String mimeType)
Right now there is no direct way of determining the mimetype. The
ows_ContentType
meta-attribute does not return the mimetype
http://blogs.msdn.com/tejasr/default.aspx,
http://social.msdn.microsoft.com/Forums/en-US/sharepointdevelopment/thread/2bea3
746-
843f-4836-b35e-7b537d6b0a75
3. traversalTimeLimitSeconds() -- This should be used to return from batch
traversal
before the traversal thread times-out. Need a bit more analysis as to where
this
check should be applied. Cannot be done in startTraversal() and
resumeTraversal(). Is
SharePointClient.updateGlobalState or SharePointClient.updateWebStateFromSite
the
right place?
Original comment by rakeshs101981@gmail.com
on 12 Sep 2009 at 1:23
For MimeType, i think we'll have to use the HttpClient call. This'll not be an
extra
work for the connector because it's already beind done in content feed mode.
Do we need to consider this in case of M&U as this will make the connector fully
TraversalContext aware.
For the third point, we first need to decide on the atomicity during the crawl.
The
atomicity can be defined either on a site level or a list level. I'll never
recommend
a document level interrupt as that will introduce lots of complexity and
probably bugs.
Considering list as an atomic crawl unit may be appropriate. The traversal
interrupt
can be initiated in the SharePointClient.updateWebStateFromSite method then.
Original comment by th.nitendra
on 14 Sep 2009 at 4:03
As per Connector Manager Issue 143 (http://code.google.com/p/google-enterprise-
connector-manager/issues/detail?id=143)
A new group: 'ignored' mimetypes have been added. If the document mimetype is
in this
list, it should be skipped entirely. For this purpose a new exception class:
SkippedDocumentException has been added to the SPI. The connector should thrwo
this
exception for such docs. More details
http://code.google.com/p/google-enterprise-connector-manager/source/detail?r=231
9
Original comment by rakeshs101981@gmail.com
on 5 Nov 2009 at 10:34
Fix details:
http://code.google.com/p/google-enterprise-connector-sharepoint/source/detail?r=
430
http://code.google.com/p/google-enterprise-connector-sharepoint/source/detail?r=
429
Original comment by rakeshs101981@gmail.com
on 5 Nov 2009 at 4:39
Verified in 2.4 Release
Original comment by ashwinip...@gmail.com
on 14 Dec 2009 at 7:03
Original issue reported on code.google.com by
jeffreyl...@gmail.com
on 20 Aug 2009 at 7:44