sul-dlss / exhibits

Stanford University Libraries online exhibits showcase
https://exhibits.stanford.edu
Other
20 stars 7 forks source link

Investigation: is there a way to streamline or more clearly display the metadata configuration #2602

Closed corylown closed 2 weeks ago

corylown commented 1 month ago

This issue is in reference to this page: https://exhibits.stanford.edu/{EXHIBIT_SLUG}/metadata_configuration/edit

Screenshot 2024-10-09 at 4 27 33 PM

From https://github.com/sul-dlss/exhibits/issues/2356

The page is not currently user-friendly for exhibit creators. Is it true or not true that some metadata fields are custom & available only to selected exhibits, while others are universally available?

hudajkhan commented 1 month ago

Happy to start on this, and happy to collaborate as well so please feel free to add yourself as an assignee

hudajkhan commented 1 month ago

Inspecting the HTML elements on the exhibits page appear to have the actual Solr field related to a fieldname within hidden inputs or in the id of the input. Based on what I can see as the pattern in the html, here is the mapping between the input field names and the Solr fields they represent.

hudajkhan commented 1 month ago

I am in the process of seeing how to query the Solr index for the indexed fields. Of the fields above, 35 are indexed while the rest are not. The rest are stored which means we will be able to see their values but we cannot query for examples where they exist.

Taking a quick look at the code (specifically app/views/spotlight/metadata_configurations), it appears the fields we see on the metadata configuration field are added to/present in the blacklight configuration. Although the form URL does seem to take into account the exhibit, the fields on the top of the page seem to be global (across all exhibits and attacked to blacklight_configuration) and the fields at the bottom of the page under "exhibit specific fields" do seem to take the actual exhibit into account (i.e. are attached to a specific exhibit).

hudajkhan commented 1 month ago

In querying Solr, I am looking specifically at which exhibits a particular field is associated with and how many documents in that exhibit are related. In the process, I have found there are documents which refer to exhibits that no longer exist. @caaster has given me a list of currently published exhibits, so I will be comparing results to that list.

hudajkhan commented 4 weeks ago

fieldnames.xlsx

hudajkhan commented 4 weeks ago

Attached above are the results of running various queries in Solr in spreadsheet format.

Summary: I took the metadata configuration page on a single exhibit page. This list will include both global configuration as well as some exhibit specific fields. I queried to see how many exhibits these fields were included, to determine if there are any that never show up or only show up for one exhibit. I could only query for the fields that were indexed. There were no fields that never showed up, but there were a few that were only present in the Parker exhibit.

The fields I queried for were:

Screenshot 2024-10-23 at 12 07 12 PM
hudajkhan commented 4 weeks ago

The fields that only showed up in the Parker exhibit were: book_title_ssim, editor_ssim, university_ssim, range_labels_ssim, and related_document_id_ssim.

hudajkhan commented 4 weeks ago

The attached spreadsheet also includes which fields are represented in which exhibits as well as the solr document count for each field in that exhibit. The spreadsheet includes more information about what the different tabs mean.

hudajkhan commented 4 weeks ago

Looking at the code, if setting up a new spotlight app, and commenting out all the "add_index_field" and "add_show_field" lines in the catalog controller yields the following:

Screenshot 2024-10-23 at 3 18 05 PM

Spotlight initializer sets up specific fields for spotlight uploads (description, attribution, date), which appear to be reflected above. Exhibit tags are neither picked up from blacklight configuration/Solr nor from Spotlight initializer, so appear for different reasons.

hudajkhan commented 3 weeks ago

Reviewing the code and trying out Spotlight using different catalog controller configurations, we have confirmed the following:

Overview For the top part of the page, the code retrieves:

As an example, the Parker exhibit has both fields that have been configured through the app as well as custom fields.

Screenshot 2024-10-28 at 12 32 24 PM In the top portion of the page (above), all the fields are displayed i.e. configured as well as custom. This section of the page allows curators to specify the ordering (overall) of the fields and the display of the fields in any of the given views (e.g. item, slideshow).

Configured fields can also be edited through the interface. For example, we can edit the label of a configured field. In this case, the object representing that field in the app holds on to both the original configuration as well as the edited version.

In the bottom portion of the page, curators can add or edit custom fields. Screenshot 2024-10-28 at 12 32 31 PM Screenshot 2024-10-28 at 12 32 36 PM

As far as what can be captured for any of these fields, each field has its associated Solr field name and label saved. In addition, custom fields can have a short description. Also, the app tracks custom fields by exhibit, whereas the configured fields are available for configuration in all exhibits. As far as mapping to Solr, when a custom field is added, the field name starts with "exhibitmetadata" followed by the title string followed by dynamic field suffix representing the type of field. The type of field is determined by the options selected in the custom field editing page, e.g. if the curator picks multivalued and free text, the field will end with "_tesim" to indicate a text stored indexed multivalued field.

The case of the two date fields Exhibits currently shows two date fields. Although both are named "Date", one corresponds to the uploaded field configuration (saved in Solr as "spotlight_upload_date_tesim") and one corresponds to a field configured through catalog controller (saved in Solr as "date_ssim", see https://github.com/sul-dlss/exhibits/blob/main/app/controllers/catalog_controller.rb#L251). Functionally, this would mean the upload date field (which is for uploaded resources) is stored as a text field while the catalog controller field is saved as a string field. In the latter case, a search would have to match the string in its entirety to produce a hit.

As far as the number of published and unpublished exhibits containing these fields, "spotlight_upload_date_tesim" is used in 43 exhibits and "date_ssim" is used in 171 exhibits (Note: These numbers are based on comparing with a list of published and unpublished exhibits provided by @caaster).

Based on the configuration for spotlight_upload_date_tesim, the same date values are also stored in the Solr field "date_sort". The Solr fields "pub_year_w_approx_isi", "pub_year_tisim", and "pub_year_isi" also get values stored based on some MODS/XML calculations. Also see https://github.com/projectblacklight/spotlight/blob/main/lib/spotlight/upload_field_config.rb#L35 for how upload field config is stored in Solr.

Code specifics The blacklight config method in https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/blacklight_configuration.rb#L82 is responsible for returning the configuration that will be displayed on the metadata page (https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/blacklight_configuration.rb#L82). This code retrieves any default configuration (See https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/blacklight_configuration.rb#L289). The code also kicks off the addition of any upload config fields (specified as "config.uploads") to index fields and exhibits tags to show fields (see https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/blacklight_configuration.rb#L328). For uploaded config fields, although the method to retrieve the list of upload fields sits at the exhibit level, the list is actually retrieved from the engine directly (i.e. this is a globally set list): See https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/exhibit.rb#L129.

After this, the code also merges any custom fields into the list of index fields. The code also sets the exhibit for the configuration.

The order of the fields is saved using weights, which probably corresponds either to whatever sort order is set on the page or to whatever order the key representing the field is in the configuration object (if no explicit weight property is set). (See https://github.com/projectblacklight/spotlight/blob/main/app/models/spotlight/blacklight_configuration.rb#L416)

The models storing field information also can allow us to see if a given object is a custom field or not.
For example, when looking at the Blacklight configuration with no fields specified from the catalog controller, but with one custom field, "index_fields" for blacklight_configuration shows us this:


index_fields={ "spotlight_upload_description_tesim"=>#<Blacklight::Configuration::IndexField label="Description", key="spotlight_upload_description_tesim", field="spotlight_upload_description_tesim", if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter, original=#<Blacklight::Configuration::IndexField label="Description", key="spotlight_upload_description_tesim", field="spotlight_upload_description_tesim", if=true, unless=false, presenter=Blacklight::FieldPresenter>, list=true, atom=true, rss=true, gallery=true, masonry=true, slideshow=true, show=true, enabled=true, immutable=#>,

"spotlight_upload_attribution_tesim"=>#<Blacklight::Configuration::IndexField label="Attribution", key="spotlight_upload_attribution_tesim", field="spotlight_upload_attribution_tesim", if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter, original=#<Blacklight::Configuration::IndexField label="Attribution", key="spotlight_upload_attribution_tesim", field="spotlight_upload_attribution_tesim", if=true, unless=false, presenter=Blacklight::FieldPresenter>, list=true, atom=true, rss=true, gallery=true, masonry=true, slideshow=true, show=true, enabled=true, immutable=#>,

"spotlight_upload_date_tesim"=>#<Blacklight::Configuration::IndexField label="Date", key="spotlight_upload_date_tesim", field="spotlight_upload_date_tesim", if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter, original=#<Blacklight::Configuration::IndexField label="Date", key="spotlight_upload_date_tesim", field="spotlight_upload_date_tesim", if=true, unless=false, presenter=Blacklight::FieldPresenter>, list=true, atom=true, rss=true, gallery=true, masonry=true, slideshow=true, show=true, enabled=true, immutable=#>,

"testcustom_tesim"=>#<Blacklight::Configuration::IndexField label="testcustom", short_description="Test custom description", key="testcustom_tesim", field="exhibit_metadata_testcustom_tesim", custom_field=true, original=#<Blacklight::Configuration::IndexField label="testcustom", short_description="Test custom description", key="testcustom_tesim", field="exhibit_metadata_testcustom_tesim", custom_field=true>, show=true, enabled=true, immutable=#, if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter>,

"exhibit_tags"=>#<Blacklight::Configuration::ShowField field="exhibit_metadata_tags_ssim", link_to_facet=true, separator_options={:words_connector=>nil, :two_words_connector=>nil, :last_word_connector=>nil}, key="exhibit_tags", label="Exhibit Tags", if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter, original=#<Blacklight::Configuration::ShowField field="exhibit_metadata_tags_ssim", link_to_facet=true, separator_options={:words_connector=>nil, :two_words_connector=>nil, :last_word_connector=>nil}, key="exhibit_tags", label="Exhibit Tags", if=true, unless=false, presenter=Blacklight::FieldPresenter>, list=false, atom=false, rss=false, gallery=false, masonry=false, slideshow=false, enabled=true, show=true, immutable=#>}


In the above, "field" refers to the Solr field name and"label" stores the field label. The custom field "testcustom_tesim" also has a "short_description" field and the property "custom_field" set to true.

We can also see both the edited values for a configured field as well as its original values. For example, when we update the label for "spotlight_upload_description_tesim", debugging shows us the following object:


"spotlight_upload_description_tesim"=>#<Blacklight::Configuration::IndexField label="Description Override", key="spotlight_upload_description_tesim", field="spotlight_upload_description_tesim", if=:field_enabled?, unless=false, presenter=Blacklight::FieldPresenter,

original =#<Blacklight::Configuration::IndexField label="Description", key="spotlight_upload_description_tesim", field="spotlight_upload_description_tesim", if=true, unless=false, presenter=Blacklight::FieldPresenter>, weight="0", list=true, gallery=true, masonry=true, slideshow=true, show=true, enabled=true, immutable=#>

Here, we can see that the field's new label is "Description Override" but the "original" key in the object stores the original configuration that shows the label "Description".

Thoughts and Suggestions With the current configuration and code, we know the following about any given field:

Currently, we cannot see within the top portion of the metadata configuration page whether any of the fields listed are coming from the application configuration or whether they are custom fields. We also can't see the Solr field name directly in the display (although we can tell by inspecting the HTML). The top section also connects the order to the order of display of those fields across the exhibit, as well as to whether or not a given field is visible in a given view.

The facet configuration page shows the Solr document counts for a given facet field. Of the fields displayed on the metadata configuration page, not all are searchable through Solr even if their values are displayed, and retrieving values for each may take some time (depending on whether or not we can add them to a facet query).

If we consider what kinds of context may be useful for curators to understand what a given field indicates, we may be able to include whether or not the field is custom. It is possible we may want to consider incorporating the short description for custom fields where it has been added, but we are already dealing with limited screen space and a lot of information. Another possibility might be the Solr field name, but that may not mean much to curators. On the other hand, the information would be useful to distinguish between the two date fields that are available for all exhibits.

Also, to give a sense of how many exhibits have any custom fields, I took the list of exhibits (published and unpublished) provided by @caaster and went through the site to see if any custom fields were listed. Out of 203, there were 47 exhibits that had custom fields. I am attaching the excel sheet with the list. customfieldlist.xlsx