sul-dlss / dlme

Digital Library of the Middle East web application, based on Spotlight
https://dlmenetwork.org/
Other
20 stars 2 forks source link

Statistics page displays wrong data for Item contributors #1003

Closed jacobthill closed 4 years ago

jacobthill commented 4 years ago

Item contributor should pull from agg_data_provider. This was probably addressed by https://github.com/sul-dlss/dlme/issues/936 but the statistics page is currently displaying the wrong country values for two orgs:

Screen Shot 2020-05-21 at 9 46 41 AM

Both institutions should have France for Data contributor and Egypt for Item contributor. The IR has the correct data. Do these records need to be reloaded or reindexed for changes to take effect or did the change in https://github.com/sul-dlss/dlme/issues/936 not get implemented correctly?

jacobthill commented 4 years ago

It seems the statistics page just has stale data. The production site has some type values in the statistics page that don't show up in the facets on the home page:

Screen Shot 2020-05-21 at 9 58 21 AM
cbeer commented 4 years ago

@jacobthill - can you say more about in what way the data is wrong?

For the items table, it seems like there may be some differences between the fields we're using. Are you asking us to change the statistics page to only pull from the cho_type_facet?

jacobthill commented 4 years ago

Sure, I have a meeting in a few minutes but looking quickly NOT FOUND and Oral narrative show up on the statistics page but not on the facets in the home page.

Screen Shot 2020-05-21 at 10 47 55 AM

I'm not sure if these are the only errors but the statistics page should reflect the same data as the facets. I'm not sure what cho_type_facet references. The facet that is displayed on the home page is just called Type in the spotlight UI and in the IR this comes from cho_edm_type and cho_has_type.

cbeer commented 4 years ago

It looks like cho_type_facet is calculated over in dlme-transform using the same fields, but clearly can get out of sync. Perhaps there's some old data in -prod that needs to be reindexed?

Looking at some of these records: https://dlmenetwork.org/library/catalog?f%5Bcho_edm_type.en_ssim%5D%5B%5D=NOT+FOUND

They don't seem to have the cho_type_facet field, or they would show up in the facets.

jacobthill commented 4 years ago

Where are you looking to see that they don't have the cho_type_facet field? NOT_FOUND is something that is used to ensure that the normalized fields (language and type) always have a value. It should only show up in the IR when the source value in the incoming metadata isn't found in any of the translation maps. This tells me I'm missing that field and need to add it to the correct translation map. If this every shows up in the facets, I re-map and re-load them. So somehow I re-mapped and re-loaded these records but the statistics page didn't update. Is there a way to reindex everything for the statistics page?

cbeer commented 4 years ago

For those 3 records, are they using https://github.com/sul-dlss/dlme-transform/blob/master/traject_configs/texas_tech_config.rb?

It seems like it's missing the line each_record add_cho_type_facet, which other transforms seem to have?

jacobthill commented 4 years ago

Yes, that's probably it. The rest of the collection doesn't have anything in the facet field: https://dlmenetwork.org/library/catalog?f%5Bagg_data_provider_en%5D%5B%5D=Texas+Tech+University+Libraries

Let me try re-mapping and see if that fixes it.