sul-dlss / dlme

Digital Library of the Middle East web application, based on Spotlight
https://dlmenetwork.org/
Other
20 stars 2 forks source link

Update how collections are counted in DLME #1327

Closed jacobthill closed 2 years ago

jacobthill commented 3 years ago

The current process for counting collections in DLME uses the file paths in dlme metadata which are passed to the agg_data_provider_collection field, but it isn't always accurate. In some cases (e.g. Sakip Sabanci) the collection is partially available from different sources and is spread over two folders when it is actually a single collection. Other collections have similar problems (e.g. QNL, AUC). The way the values are mapped has been updated and should be accurate. However, all values will be fully available in English and Arabic. It seems like the counting is based on all unique values in the agg_data_provider_collection field and it should be based on all unique values in the agg_data_provider_collection field in the English language. In other words, it should not count the collection name in English and its translation in Arabic as two separate collections. So either we need to parse the languages before counting or we could update the counter to use the agg_data_provider_collection_id field.

At the moment its not clear to me where collection counts are coming from. For example, looking at American University in Cairo on the contributors page I see 23 collections, but if I click on the link to the data provider, there is no data in the Collection ID or the Collection facet. Maybe this is a caching issue?

mwerla commented 2 years ago

@jacobthill Replying to your question, I think it would be nice to list collections under each contributor on the Contributors page and link to their content.

jacobthill commented 2 years ago

Thanks @mwerla, I'm removing that question and breaking this out into a separate ticket.

jacobthill commented 2 years ago

@cbeer I have updated the ticket above and this is now ready for work.

corylown commented 2 years ago

At the moment its not clear to me where collection counts are coming from. For example, looking at American University in Cairo on the contributors page I see 23 collections, but if I click on the link to the data provider, there is no data in the Collection ID or the Collection facet. Maybe this is a caching issue?

corylown commented 2 years ago

For a little more context, there are only 7 unique values in the agg_data_provider_collection_id_ssim field:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":9,
    "params":{
      "q":"*:*",
      "facet.limit":"-1",
      "facet.field":"agg_data_provider_collection_id_ssim",
      "rows":"0",
      "facet":"true"}},
  "response":{"numFound":160830,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "agg_data_provider_collection_id_ssim":[
        "aims",71,
        "brooklyn-museum-arts-of-the-islamic-world",1090,
        "brooklyn-museum-egyptian",7298,
        "cambridge-genizah",22232,
        "harvard-scw",14104,
        "ifpo-photographs",13869,
        "penn-near-east",11066]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}
jacobthill commented 2 years ago

@corylown ok thanks for clarifying. The data is in various states of completion. If you need records with complete data, Archives ouvertes de l’Institut français du Proche-Orient should work. The rest of the data will be refreshed this week.