rajgithub123 / google-enterprise-connector-sharepoint

Automatically exported from code.google.com/p/google-enterprise-connector-sharepoint
0 stars 0 forks source link

Cannot detect all sharepoint sites #73

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Installed Connector 1.32 on a client PC
2. Add andsetup sharepoint connector manager on GSA
3. Add robots.txt to Sharepoint root
4. Add crawl URL

What is the expected output? What do you see instead?
According to the Sharepoint console, there are around 150k documents 
right there, but after the GSA can find around 8k docs only. I found 
many folders including the hierarcy are missed.
I used the sharepoint admin to crawl and probably there are no 
permission problems.

What version of the product are you using? On what operating system?
GSA 5.2
W2k3 server with Sharepoint 2003
XP client with Sharepoint connector 1.32 installed
XP client to access GSA console

Please provide any additional information below.
On a level, there are 7 sites inside. And I found that the GSA can only 
detect 2 out of them.

Original issue reported on code.google.com by cks...@gmail.com on 12 May 2009 at 6:13

GoogleCodeExporter commented 9 years ago
1. Please ensure that the sites discovered by the connector match 'Include URLs 
Matching the Following Patterns' and do not match 'Do Not Include URLs Matching 
the 
Following Patterns'

2. Check the connector logs to see if certain URLs are being excluded. If you 
see a 
log message like:
com.google.enterprise.connector.sharepoint.client.ListsWS getLinkChanges
WARNING: getLinkChanges(BaseList list, Calendar since) : excluding <actual URL>
Then update the 'Include URLs Matching the Following Patterns' and/or 'Do Not 
Include 
URLs Matching the Following Patterns' such that the URLs are not excluded

3. If there are no URLs being excluded, then enable the feed logging and check 
that 
feeds are being sent to GSA

4. Assuming that feeds are being sent to GSA, on GSA make sure that you have 
all URLs 
that are supposed to be crawled and indexed added to 'Crawl And Index--> Crawl 
URLs--
>Follow and Crawl Only URLs with the Following Patterns'
If you have not made appropriate entries here, GSA will reject the feeds for 
URLs 
which do not have a matching entry in the above field and hence you will find 
that 
the documents have not been crawled

Note: The connector version should be 1.3.2 and not 1.32 since we do not have a 
release with version with 1.32

Original comment by rakeshs101981@gmail.com on 12 May 2009 at 6:54

GoogleCodeExporter commented 9 years ago
Closing this issue as no concern has been reported by the user after the 
suggested 
troubleshooting tips.

Original comment by th.nitendra on 30 Jun 2009 at 9:15