wvuweb / cleanslate-cms

A place to file issues and view releases for CleanSlate CMS. http://cleanslatecms.wvu.edu
6 stars 0 forks source link

Google Search Appliance (GSA) Integration #12

Open adamjohnson opened 9 years ago

adamjohnson commented 9 years ago

From @nreckart on October 17, 2013 15:36

We need the ability to search a site via the GSA and display the results within the theme of the site.

GSA Documentation: https://developers.google.com/search-appliance/documentation/610/xml_reference?csw=1

Copied from original issue: wvuweb/cleanslate#11

adamjohnson commented 9 years ago

Relevant docs & options from GSA search integration in Slate 1:

http://slatecms.wvu.edu/howtos/developers/theme_development/add-search

adamjohnson commented 9 years ago

I'm going to have a search on the upcoming diversity.wvu.edu and accessibility.wvu.edu. Both sites are moving to CleanSlate.

I'm happy to do what Adam Glenn did for IT (hitting "Search" takes you to search.wvu.edu), but would also be a guinea pig for in site search.

adamjohnson commented 9 years ago

Docs for (external) search as seen on the CleanSlate site:

http://cleanslatecms.wvu.edu/how-to/theme-development/search

I asked @jja006 to give me a breakdown of all of the different <input>'s and attributes in that snippet (to make sure everything was legit since it will be copied ad infinitum). Here's his response:

Ok I've attached a copy of my notes on the various options, but to boil it down:

form-search_field needs to be named "q" instead; the GSA expects the search query to come in on the form field named "q", and that is required.

as_sitesearch is correct, and is required to limit the search to a specific site or path within the site. You can omit it if you want to search all WVU indexed sites instead.

sort is completely optional. It lets you specify a specific sorting option, I think the GSA will default to relevancy based sorting if this is omitted.

output is required - tells the GSA to respond with either the results page on the appliance (xml_no_dtd) or to give you raw XML that you'll then publish out within the sites template basically.

proxystylesheet is required IF using the GSA's own results page. Each front-end defined on the GSA has a stylesheet assigned but you can mix and match in limited ways. If you are getting raw xml, you'll apply your own stylesheet instead, so you wouldn't need to specify this in that case.

client is required. This tells the GSA which front-end to use, but not the front-end in a design sense. In this case the front-ends store details about the search collection behavior - so things like keymatches, results biasing, etc. that are set on the appliance on a per-front-end basis.

I don't have any documentation on "ie" or "oe". My hunch is those are related to cases when getting raw XML and not using the GSA results pages. I'm pretty sure you can omit those, but you may want to check with Steve. It's possible those are needed to set the character set for issues related to browser compatibility or similar for some clients (mobile?) etc.

Here's his notes from the email:

http://pastie.org/9701297

@zeroedin can you comment on that last paragraph (ie and oe)?

I figured this explanation is relevant to in site search. Mo' docs = mo' betta'.

adamjohnson commented 9 years ago

From @zeroedin on November 7, 2014 20:28

ie

Sets the character encoding that is used to interpret the query string. See Internationalization for more information.

Default value: latin1
oe

Sets the character encoding that is used to encode the results. See Internationalization for more information.

Default value: UTF8

I almost wonder if this is better served by a radius tag with parameters instead of allowing the designer to define it. Some of these values shouldn't directly be customized by the theme necessarily. For example the two above should always be a standard and should not be modified. As well as proxy_stylesheet and client parameters, these might break the output if changed.

In fact the only option really should be as_sitesearch and possibly sort but high doubt we'd want to modify the default sorting of the GSA in any instance.

@nreckart @adamjohnson @jja006 comments?

adamjohnson commented 9 years ago

+1 for a radius tag to simplify things.

Slate 1 uses the following:

<%= google_search_box :advanced => false, :placeholder => "Search..."  %>

We'd just have to build the tag (with an attribute, perhaps) to account for in-site search when that feature lands.

If people wanted to heavily customize the markup for any reason they could just copy and paste from the generated source. That said, my guess would be that most people will not customize anything.

dmolsen commented 9 years ago

Going to need this sooner rather than later based on the ideas that Glenn has regarding the WVU header.

adamjohnson commented 9 years ago

Steve says the XML response from the GSA is going to stay the same between what's live on search.wvu.edu and his updates to Search.

Here's an XML response for "woodburn" from search.wvu.edu:

http://search.wvu.edu/search?q=woodburn&btnG=Search&output=xml

Basically, it's:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<GSP VER="3.2">
    <TM>0.042303</TM>
    <Q>woodburn</Q>
    <PARAM name="q" value="woodburn" original_value="woodburn" />
    <PARAM name="btnG" value="Search" original_value="Search" />
    <PARAM name="output" value="xml" original_value="xml" />
    <PARAM name="ie" value="UTF-8" original_value="UTF-8" />
    <PARAM name="ulang" value="en" original_value="en" />
    <PARAM name="ip" value="173.44.54.42" original_value="173.44.54.42" />
    <PARAM name="access" value="p" original_value="p" />
    <PARAM name="sort" value="date:D:L:d1" original_value="date:D:L:d1" />
    <RES SN="1" EN="10">
        <M>1090</M>
        <FI />
        <NB>
            <NU>/search?q=woodburn&amp;lr=&amp;ie=UTF-8&amp;output=xml&amp;access=p&amp;sort=date:D:L:d1&amp;start=10&amp;sa=N</NU>
        </NB>
        <R N="1">
            <U>http://facilitiesplanning.wvu.edu/historical_buidlings/woodburn-hall</U>
            <UE>http://facilitiesplanning.wvu.edu/historical_buidlings/woodburn-hall</UE>
            <T>&lt;b&gt;Woodburn&lt;/b&gt; Hall | Facilities Planning and Scheduling | West &lt;b&gt;...&lt;/b&gt;</T>
            <RK>10</RK>
            <ENT_SOURCE>T3-QDZWDAT3RYWQZ</ENT_SOURCE>
            <FS NAME="date" VALUE="" />
            <S>&lt;b&gt;...&lt;/b&gt; &lt;b&gt;Woodburn&lt;/b&gt; Hall. &lt;b&gt;Woodburn&lt;/b&gt; Hall was completed in 1876 and is the centerpiece&lt;br&gt; of &lt;b&gt;Woodburn&lt;/b&gt; Circle, the oldest part of the WVU campus. &lt;b&gt;...&lt;/b&gt;  </S>
            <LANG>en</LANG>
            <HAS>
                <L />
                <C SZ="14k" CID="rGWnPBOS1A8J" ENC="ISO-8859-1" />
            </HAS>
        </R>
        <R N="2" L="2">
            <U>http://facilitiesplanning.wvu.edu/historical_buidlings/woodburn-circle-landscaping</U>
            <UE>http://facilitiesplanning.wvu.edu/historical_buidlings/woodburn-circle-landscaping</UE>
            <T>&lt;b&gt;Woodburn&lt;/b&gt; Circle Landscaping | Facilities Planning and &lt;b&gt;...&lt;/b&gt;</T>
            <RK>10</RK>
            <ENT_SOURCE>T3-QDZWDAT3RYWQZ</ENT_SOURCE>
            <FS NAME="date" VALUE="" />
            <S>&lt;b&gt;...&lt;/b&gt; Marina Towers 4th Floor 48 Donley Street PO Box 6555 Morgantown,&lt;br&gt; WV 26506. &lt;b&gt;Woodburn&lt;/b&gt; Circle Landscaping. &lt;b&gt;Woodburn&lt;/b&gt; &lt;b&gt;...&lt;/b&gt;  </S>
            <LANG>en</LANG>
            <HAS>
                <L />
                <C SZ="15k" CID="CpW6qeYnVEEJ" ENC="ISO-8859-1" />
            </HAS>
            <HN U="facilitiesplanning.wvu.edu/historical_buidlings">facilitiesplanning.wvu.edu/historical_buidlings</HN>
        </R>

        ...repeating results...

    </RES>
</GSP>

I can look into providing HTML and CSS for the results that get output by CleanSlate. @nreckart are you interested in that?

nreckart commented 9 years ago

Sure

adamjohnson commented 9 years ago

Here's HTML and (S)CSS documents with my proposed markup for in site search with the GSA:

http://cl.ly/ax0C

The CSS styles are basic, barebones styles to make it feel slightly organized. With Slate, we included these styles in the global stylesheet. I suppose we can decide if we want to include these styles by default (guessing no?).

Let me know how I can help more with this issue. It'd be really sweet to add this as a feature.

dmolsen commented 9 years ago

Why are all of the results pointing at example.wvu.edu/yo-dog? That doesn't seem very realistic. Just sayin'.

Why guess no? Just curious if I'm overlooking something obvious.

adamjohnson commented 9 years ago

It may be that the GSA inserts some basic styles of its own, not sure.

In any event, we leave styling of things like this up to the theme developer rather than include them in CleanSlate. Examples are things like the FlexSlider Snippet. CSS and JS for FlexSlider need to be included by the theme developer.

Also, these styles are likely to get globbed into the global stylesheet and/or brand-patterns. One way or another, they get included—or intentionally excluded. If they don't get included, the theme developer will notice and take action.

dmolsen commented 9 years ago

Ah, k. I was just thinking it would be good if there was a way to get this to roughly working "out of the box" styling-wise. Just provide a default example somewhere that someone can use in their theme?

adamjohnson commented 9 years ago

:+1: absolutely.

dmolsen commented 9 years ago

Is this still an outstanding issue? Thought this was done.

adamjohnson commented 9 years ago

Hasn't been done to my knowledge. Yes, you can embed a search form onto a page; however, the results are then listed at search.wvu.edu instead of the current site's domain.

dmolsen commented 9 years ago

Thanks for the update, @adamjohnson

adamjohnson commented 9 years ago

This is what this issue is about, except having this feature available for CleanSlate:

http://slatecms.wvu.edu/s/?q=slate&btnG=Search&sort=date%3AD%3AL%3Ad1&output=xml_no_dtd&ie=UTF-8&oe=UTF-8&client=default_frontend&site=default_collection

Essentially having search results wrapped in the theme of the site you're searching on.