googlegsa / manager.v3

Google Search Appliance Connector Manager
Apache License 2.0
10 stars 10 forks source link

Include path info and file extension in the googleconnector URLs #136

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This should be investigated, so see if we can efficiently include this 
information in the 
googleconnector URL, and whether that would allows us to assign connector 
content to 
collections in the normal way, and also filter out documents by extension in 
the normal way. 
Note that the TraversalContext is much more efficient for filtering unsupported 
content types, 
but it has no way to filter out previously indexed content.

For example, instead of 

googleconnector://foo.localhost/doc?docid=123456

we could send

googleconnector://foo.localhost/doc/folder1/folder2/file.ext?docid=123456

Original issue reported on code.google.com by jl1615@gmail.com on 3 Apr 2009 at 7:46

GoogleCodeExporter commented 9 years ago
This may vary from connector to connector. At least for the Livelink connector, 
I don't think the needed 
information is available when deleting documents to recreate this syntax. 
Perhaps we could change the GSA to 
accept the variation without the path and filename as a match when deleting 
documents?

Original comment by jl1615@gmail.com on 29 Apr 2009 at 6:54

GoogleCodeExporter commented 9 years ago
Would require the GSA to have some special logic associated with 
googleconnector:// 
urls to use the docid as the primary key since the full path will not be known 
for 
all actions/operations.

Original comment by mgron...@gmail.com on 6 May 2009 at 11:02

GoogleCodeExporter commented 9 years ago

Original comment by jl1615@gmail.com on 18 Sep 2009 at 8:35

GoogleCodeExporter commented 9 years ago
Needs to be worked out with new authz URL.

Original comment by mar...@google.com on 4 Dec 2009 at 11:19

GoogleCodeExporter commented 9 years ago
Could this also help address the issue with Duplicate Directory Automatic 
Filtering?  When you click on "More Results from..." for results from a 
connector, like SharePoint, you get no results.

Original comment by chad...@gmail.com on 4 Nov 2010 at 3:23

GoogleCodeExporter commented 9 years ago
The problem where clicking the More Results from link returned no results was 
fixed in 6.10.

The problem with introducing the path and file extension to the URLs is that 
neither one is really part of the identity of ECM objects. We would have to 
remember the values when a document was indexed so that we could correctly 
handle moves, changes in the content type, or deletes. There are other feature 
requests that would directly support the needed operations, such as configuring 
collections based on metadata elements or using the content type with the 
filetype: query operator. Those approaches also do not depend on the use of the 
googleconnector URL.

Original comment by jla...@google.com on 7 May 2011 at 10:43