ukwa / ukwa-ui

A new user interface for the UK Web Archive
BSD 3-Clause "New" or "Revised" License
0 stars 6 forks source link

Some collections not visible in category view #353

Open anjackson opened 2 years ago

anjackson commented 2 years ago

In response to https://github.com/ukwa/w3act/issues/676

Looking at:

https://github.com/ukwa/ukwa-ui/blob/465e94c7e4e1f12f2467c5f5cab4c9f98a579df6/src/main/java/com/marsspiders/ukwa/solr/SolrCommunicator.java#L157

This seems to map to this query, where the UK General Election 2015 collection can be seen: http://prod1.n45.wa.bl.uk:9021/solr/collections/select?indent=on&facet=true&facet.pivot=collectionAreaId,id,description,name&q=*:*&facet.limit=-1&facet.pivot.mincount=1&wt=json&rows=1

The code that consumes this list is not obvious to me, so I'm not sure what's happening there:

https://github.com/ukwa/ukwa-ui/blob/465e94c7e4e1f12f2467c5f5cab4c9f98a579df6/src/main/java/com/marsspiders/ukwa/controllers/CategoryController.java#L81

The only thing I spotted is that the UK General Election 2015 collection is preceded by a entry that has no description:

            {
              "field":"id",
              "value":"4148",
              "count":1},
            {
              "field":"id",
              "value":"60",
              "count":1,
              "pivot":[{
                  "field":"description",
                  "value":"Collection of websites, curated by staff at the Legal Deposit Libraries, focussing on the 2015 UK General Election which was held on 7 May 2015 to elect 650 members to the House of Commons. It was the first general election at the end of a fixed-term Parliament. \n",
                  "count":1,
                  "pivot":[{
                      "field":"name",
                      "value":"UK General Election 2015",
                      "count":1}]}]},

Is it possible that that's breaking things? You could try this by filtering out items with no description or title (name:[* TO *] AND description:[* TO *]), e.g.

http://prod1.n45.wa.bl.uk:9021/solr/collections/select?indent=on&facet=true&facet.pivot=collectionAreaId,id,description,name&q=name:[*%20TO%20%20*]%20AND%20description:[*%20TO%20*]&facet.limit=-1&facet.pivot.mincount=1&wt=json&rows=0

min2ha commented 2 years ago

Actually the SOLR query related to AREAS is there: https://github.com/ukwa/ukwa-ui/blob/465e94c7e4e1f12f2467c5f5cab4c9f98a579df6/src/main/java/com/marsspiders/ukwa/solr/SolrCommunicator.java#L157

crarugal commented 2 years ago

Top level collections (and sub-collections) that are visible, searchable, published, and/or not listed https://docs.google.com/spreadsheets/d/1i77oxEa4sPfUk4wAdQRp3xo-KAU-cR9pIEObWGZaLUg/edit#gid=2019239782

crarugal commented 2 years ago

Collections that's can't be searched for in dev:

W3ACT link | Collection ID | Collection name | anomaly? | Topics and Themes page link | Viewable on site | ttype | id | url | created_at | Name length | Description length | publish -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- https://www.webarchive.org.uk/act/collections/60 | 60 | UK Gen election 2015 | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/60 | Yes | collections | 60 | act-300 | 2015-02-09 14:07:06 | 24 | 264 | TRUE https://www.webarchive.org.uk/act/collections/689 | 689 | Scottish elections 2016 | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/689 | Yes | collections | 689 | act-689 | 2016-01-18 12:03:39 | 36 | 220 | TRUE https://www.webarchive.org.uk/act/collections/851 | 851 | Queens birthday 2016 | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/851 | Yes | collections | 851 | act-851 | 2016-05-26 11:37:23 | 34 | 115 | TRUE https://www.webarchive.org.uk/act/collections/2778 | 2778 | Unfinished business | Missing description | https://www.webarchive.org.uk/en/ukwa/collection/2778 | Yes | collections | 2778 | act-2778 | 2019-09-30 10:15:45 | 49 | 0 | TRUE https://www.webarchive.org.uk/act/collections/3064 | 3064 | Startup | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/3064 | Yes | collections | 3064 | act-3064 | 2020-04-28 11:11:13 | 19 | 332 | TRUE https://www.webarchive.org.uk/act/collections/3098 | 3098 | UK Retail | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/3098 | Yes | collections | 3098 | act-3098 | 2020-07-17 13:25:10 | 46 | 590 | TRUE https://www.webarchive.org.uk/act/collections/3866 | 3866 | Duke of edinburgh | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/3866 | Yes | collections | 3866 | act-3866 | 2021-04-19 08:42:07 | 17 | 668 | TRUE https://www.webarchive.org.uk/act/collections/4148 | 4148 | NHS Patient Surveys \| UKWA Topics and Themes | Missing Description | https://www.webarchive.org.uk/en/ukwa/collection/4148 | Yes | collections | 4148 | act-4148 | 2022-01-13 13:39:10 | 19 | 0 | TRUE https://www.webarchive.org.uk/act/collections/4214 | 4214 | Ukraine 2022 | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/4214 | Yes |   | 1163 | act-4214 | Wednesday, March 02, 2022 | 12 | 739 | TRUE https://www.webarchive.org.uk/act/collections/4088 | 4088 | The Queen's Platinum Jubilee 2022 | Collection description was present, but stil not searchable | https://www.webarchive.org.uk/en/ukwa/collection/4088 | Yes | collections | 4088 |   |   | 33 | 569 | TRUE
min2ha commented 2 years ago

The scope of this ticket is Collection Visibility in Category view only. (searchable or not is out of scope)

The source of JSON of Top Collections (https://www.webarchive.org.uk/act/collections/allCollectionAreasAsJson/7)

Case of Collection 3098 (aka UK Retail) (Listed in Working On(!)) From JSON of Top Collections we know that it exists in 3 Collection Areas: Places, Society & Communities and Working On(!).

BTW Long time ago we agreed, that by default we do not expose collections listed in Working On.

Count of collections in 'Working On' (from JSON is 7) and from SOLR instance is 5 (2 less due to field Publish:NO probably): http://prod1.n45.wa.bl.uk:9021/solr/collections/select?q=collectionAreaId:2945&indent=on&wt=json&rows=100

crarugal commented 2 years ago

Thanks, Mindy, and apologies, I wasn't aware (or maybe I forgot) that Collections tagged into "Working On" would not be exposed.

Should it be the case that Collections that are both published and tagged into "Working On" should still be searchable? Perhaps this is more of a curatorial question as I think all published Collections should still be searchable.

nicolabingham commented 2 years ago

Please can we remove the "Working On" collection so it is not available to ACT users or end users please? We will make sure anything currently tagged in this collection is removed from it and tagged into other collections. Thanks.

nicolabingham commented 2 years ago

I have added descriptions for two collections that were missing them (https://www.webarchive.org.uk/act/collections/4148 NHS Patient Surveys and https://www.webarchive.org.uk/act/collections/2778 Unfinished business) And have untagged three collections from the 'working on' category https://www.webarchive.org.uk/act/collections/3064 3064 Startup https://www.webarchive.org.uk/act/collections/3098 3098 UK Retail
https://www.webarchive.org.uk/act/collections/3866 3866 Duke of edinburgh
Will check back tomorrow to see if they are visible.