set search behavior to grab metadata + fields directly from .csv (and avoid redundant index)

cassws commented 4 years ago

search is based on elasticlunr as per search/index.json, indexed data stored in _includes/data_total.html

search behavior defined in _includes/search_box.html. In particular, it looks like fields are defined as collection[1].fields in lines 5-11

{%- capture fields -%}
  {%- for collection in config.collections -%}
    {%- for field in collection[1].fields -%}
      {{ field }}{% unless forloop.last %}|||{% endunless %}
    {%- endfor -%}
  {%- endfor -%}
{%- endcapture -%}

and the data source is defined as index in lines 23-26

<script type='text/javascript'>
  var fields    = {{ fields | jsonify }};
  var indexFile = "{{ index }}";
  var url       = "{{ '' | absolute_url }}";

If we use single csv as source of truth, it seems that we could use jekyll's helper functions like jsonify to simply render the index from this csv and grab the fields from this csv. If we use two csvs (one for datasets one for dataviz examples), we may still wish to use a search index and to write a helper function (or script) to build this index when the site is updated, given this script will have to pool from two CSVs.

To do:

[ ] Confirm whether we're using a single CSV or two CSVs
[ ] Confirm where our source of truth should be for fields used for searching (e.g. grab them from CSVs themselves, set them statitically somewhere?)
[ ] Rewrite _includes/search_box.html to match

cassws commented 4 years ago

For future reference, here's an example of grabbing header names directly from CSV (to avoid specifying fields manually if we'd prefer automatic)

cassws commented 4 years ago

Pausing until we implement the single .csv, and then will finalize this change!

cassws commented 4 years ago

ALTERNATIVELY, we can use a bundle exec rake command to generate index data, as per _config.yml. It sounds like we would just need to do this after whatever steps we take to generate the collections from the csv. We may still have to set fields manually however - or automatically from the csv header itself, as per the example linked above. This may be the easiest way to do things:

from _config.yml

# --------------------------------------------------------------
# SEARCH INDEX SETTINGS
# --------------------------------------------------------------
# You can create multiple search indexes below (though only one is
# recommended!) by specifying
# an `index` file to write it to and some `collections` for it to index
# and running `$ bundle exec rake wax:search` .

search:
  main:
    index: '/search/index.json' # file the index will get written to
    collections:
      datavis:
        content: false # whether or not to index page content
        fields: # the metadata fields to index
          - description
          - language
          - object_type
          - topic
          - purpose
          - audience_level
          - audience_composition
          - additional_properties
          - pid
          - layout
          - label
        datasets:
        content: false # whether or not to index page content
        fields: # the metadata fields to index
          - description
          - language
          - object_type
          - topic
          - purpose
          - audience_level
          - audience_composition
          - additional_properties
          - pid
          - layout
          - label

cassws commented 4 years ago

OK feel free to disregard the above! I identified a small typo in config.yml that was interfering with the index raking, which I presume is why we were referencing data_total in search. Fixed and tested raking and now search appears to correctly see both datasets and data visualizations. Also tested fields behavior and seems to correctly grab all fields based on config.yml

Pushing update now. Note in future, we will need to run bundle exec rake wax:search main after any other raking to create collections in order to update search index based on collections (that is, this method doesn't see the .csv, only the collections metadata).

visualizingthefuture / examples-repository

set search behavior to grab metadata + fields directly from .csv (and avoid redundant index) #32