Closed sethaj closed 7 years ago
There seems to a difference between .to_solr
and what's actually in the solr index? to_solr
has the single value, but query
produces a multi-value.
pry(main)> FileSet.find('jq085m87j').to_solr['creator_full_name_tesim']
=> "Stanislavsky, Konstantin"
pry(main)> ActiveFedora::SolrService.query("{!terms f=id}jq085m87j")[0]['creator_full_name_tesim']
=> ["Stanislavsky, Konstantin"]
Is this an input-output discrepancy? Everything that ends in 'm' is meant to be multi-valued. https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
In general we (or CC?) use multi for all metadata. You can still push whatever you want onto the document though (most fields ending in 'm' show [] values in your average to_solr results, but not all), but maybe Solr packages the result according to that 'm' when spitting things out?
The short answer:
The fields are stored in fedora as single-value, but you can index them in solr any way you like. In this case, we chose 'stored_searchable', which makes them multi-value (in solr).
The reason we chose 'stored_searchable' is because that's what I always choose for fields that should be searchable. I probably just didn't think about the fact that it would be more technically accurate to store the field as *_tesi
instead of *_tesim
, because most fields are multi-value.
You might consider it sloppy to index the field in solr as multi-value when we know that it's always single-value, but it should be harmless to index it that way. I don't think we need to change it unless it's causing a problem.
The long answer:
The properties for the first and last name are defined as multiple: false
, so they'll be stored in fedora as single-value fields. Those fields are defined here:
And they are being set as single-value here: https://github.com/mlibrary/heliotrope/blob/master/app/models/concerns/stores_creator_name_separately.rb#L21
So that's why it's single-value when you call monograph.to_solr
.
But when solr saves the value, it doesn't matter that you set it as a string instead of an array; Solr will respect the config of that dynamic field and store it as an array. 'stored_searchable' translates to *_tesim
, which you can see in the rails console:
[19] pry(main)> Solrizer.solr_name('foo', :stored_searchable)
=> "foo_tesim"
And, in our solr config, we defined *_tesim
:
https://github.com/mlibrary/heliotrope/blob/master/solr/config/schema.xml#L222
where the meaning of *_tesim
is this:
Sky*
and it will match on "Skywalker".There's a little more info about the dynamic fields here: https://wiki.apache.org/solr/SchemaXml#Common_field_options
Great, ok thanks that makes sense. The problem I was running into was how I was building say a presenter
in a test and that it's fields sometimes didn't match what I saw in the app (singular vs. multi). So I think I just need to be more intentional/explicit in my tests when it comes to representing solr docs so that my tests match what's actually happening in the app.
Another option would be to change the indexed fields from *_tesim
to *_tesi
, but then aside from changing the code, you'll have to re-index the data that already exists in your production app.
In solr I'm seeing creator_full_name_tesim (monograph or file_set) coming through as a multi-value field like here:
it looks like it's specifically not supposed to be multi.
https://github.com/mlibrary/heliotrope/blob/master/spec/models/stores_creator_name_separately_spec.rb#L24
How/where is this happening? Is it supposed to be multi or single?