rajgithub123 / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

sharepoint content last modified date #66

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The connector appears to just send the crawl date.
Sharepoint server returns current date in header.
Is it possible to feed (or acquire) the last modified date as indicated in 
the site admin?

Expected results: the "sort by date" should work as it's supposed to.

Internal ticket reference: 398008074

Original issue reported on code.google.com by jeffreyl...@gmail.com on 4 Mar 2009 at 8:55

GoogleCodeExporter commented 9 years ago
Another similar issue: Internal ticket reference: 389899344. It seems that 
SharePoint 
is not behaving exactly according to HTTP protocol. This limitation could only 
be 
solved if SharePoint supports content feed.

according to that ticket:

The problem in fact is not on the connector site side - but on the web
server(IIS in this case). On all the re-crawls that the appliance performs
the web server is returning a HTTP 200, instead on a HTTP 304 if the
content has not been modified.
For the example you sent here a snippet from a network dump:

GET /FinanceEU/Shared%20Documents/vc.aspx HTTP/1.0

Host: sydney.enterprise.lon.corp.google.com:31337

Accept: text/html,text/plain,application/*

User-Agent: gsa-crawler (Enterprise; T1-GYXJANAVSGWAS; xen@google.com)

Accept-Encoding: gzip

If-Modified-Since: Wed, 21 Jan 2009 10:27:05 GMT

Authorization: Basic RU5URVJQUklTRVx4ZW46ZW50ZXJwcmlzZQ==

HTTP/1.1 200 OK

Connection: close

Date: Wed, 21 Jan 2009 10:29:02 GMT

Server: Microsoft-IIS/6.0

MicrosoftSharePointTeamServices: 12.0.0.6219

X-Powered-By: ASP.NET

X-AspNet-Version: 2.0.50727

Set-Cookie: WSS_KeepSessionAuthenticated=31337; path=/

Set-Cookie:
http%3A%2F%2Fsydney%2Eenterprise%2Elon%2Ecorp%2Egoogle%2Ecom%3A31337%2FFinanceEU
%2FDi
scovery=WorkspaceSiteName=RmluYW5jZUVVIERlbW8=&WorkspaceSiteUrl=aHR0cDovL3N5ZG5l
eS5lb
nRlcnByaXNlLmxvbi5jb3JwLmdvb2dsZS5jb206MzEzMzcvRmluYW5jZUVV&WorkspaceSiteTime=Mj
AwOS0
wMS0yMVQxMDoyOTowMg==;
expires=Fri, 20-Feb-2009 10:29:02 GMT; path=/_vti_bin/Discovery.asmx

Cache-Control: private, max-age=0

Expires: Tue, 06 Jan 2009 10:29:02 GMT

Last-Modified: Wed, 21 Jan 2009 10:29:02 GMT

Content-Type: text/html; charset=utf-8

Content-Length: 91219 

Here it can be seen the "If-Modified-Since" header from the appliance. In
this case the web server responds with a HTTP 200 - so if the content has
not been modified then the web server does not respect the
"If-Modified-Since" header and sends the content again instead of a HTTP
304.

Original comment by jeffreyl...@gmail.com on 4 Mar 2009 at 9:33

GoogleCodeExporter commented 9 years ago

Original comment by rakeshs101981@gmail.com on 17 Mar 2009 at 4:09

GoogleCodeExporter commented 9 years ago
SharePoint Connector sends the Last Modified Date for the documents as returned 
by 
the Web Services. This is different from the one that the web server returns in 
the 
response header "Last-Modified", when HTTP GET request is made to a URL. 
Though, 
these two values for a document are expected to be same. 

Since, connector sends the last modified date for the documents; “sort by 
date” 
should work as expected. A few reasons that might cause the problem:

1. Connector does not always send the last modified value as 
“Last-Modified” 
attribute. This is because most of the attribute names are manipulated by the 
web 
service itself. Hence “Last-Modified” might look like 
“Last_x0020_Modified”.
2. Last Modified date sent by the connector might not always in the exact 
format 
that is expected by the web server.

Web Server not respecting the "If-Modified-Since" header seems to be a bug from 
web 
server side. This has been confirmed for a SharePoint Site hosted on IIS.

Original comment by th.nitendra on 17 Mar 2009 at 3:17

GoogleCodeExporter commented 9 years ago

Original comment by deshpa...@google.com on 17 Feb 2011 at 10:04