ukwa / w3act

w3act is an annotation and curation tool for building web archive collections
Apache License 2.0
19 stars 6 forks source link

Collection Areas - the list of Collections IDs is incomplete #676

Open min2ha opened 2 years ago

min2ha commented 2 years ago

Unassigned collections exist.

Attention is needed on Collection (example):
https://www.webarchive.org.uk/act/collections/369

Data source used for T&T on UI: https://www.webarchive.org.uk/act/collections/allCollectionAreasAsJson/0

Collection Area is showing as blank on the Collection View pages!?

min2ha commented 2 years ago

we have a list of IDs for area 'Currently Working On' as well, but we don't expose it: {"key":2945,"title":"Currently Working On","url":"/act/taxonomy/2945","select":false,"children":null,"collections_ids":[2946,3064,3098,3188,3866,4089]}

nicolabingham commented 2 years ago

@jasonwebber-bl both the New Media Writing Prize Collection and Climate Change are now visible under their respective higher-level categories on the website. Did you check them into the higher level categories (Collection Areas) in ACT as I didn't do this?

crarugal commented 2 years ago

Investigating Collection Area for "Science, Technology & Medicine": https://www.webarchive.org.uk/act/collections/list?s=title

Supposedly with a count of 31

- however, only 29 collections are listed:

image These are the 29 collections displayed, when "Science, Technology & Medicine" filter is applied image

There are two missing collections in the filtered view, highlighted in yellow: image

The reason for them not being shown, is because they sub sub-collections: https://www.webarchive.org.uk/act/collections/2946 image

https://www.webarchive.org.uk/act/collections/4168 image

It's currently unclear how they are being tagged into the "Collection Areas". When you view the ACT record for the collection, the "Collection Areas" is blank. Even though it's displayed when filtering by "Science, Technology & Medicine" Aging collection: https://www.webarchive.org.uk/act/collections/2367

image

Filtered view: image

taxonomy table "Aging" collection info image image

These are the Collection Areas: image image

Where does the count of 31 come from? It comes from the taxonomy_parents_all table: image image

SQL

Select
    taxonomy_parents_all.taxonomy_id,
    taxonomy_parents_all.parent_id
From
    taxonomy_parents_all
Where
    taxonomy_parents_all.taxonomy_id = 2938

This is why 31 results are returned: image

crarugal commented 2 years ago

in the Topics and Themes page, when filtering by "Science, Technology & Medicine"only 15 collections are being presented, when there should be 31: ![image](https://user-images.githubusercontent.com/18530934/172342411-db477b53-1d67-4e7c-a552-c36237541ba8.png

The highlighted collections are the 15 being presented:. The remaining 16 are the ones that are missing, according to ACT image

The 31 collections tagged into "Science, Technology & Medicine" image

crarugal commented 2 years ago

https://www.webarchive.org.uk/act/collections/allCollectionAreasAsJson/7

If we look at the list that's being pulled image

And compare that to the 31 collections in ACT (green highlight=json list, yellow highlight =15 presented, we can see that both lists are the same: image

jasonwebber-bl commented 2 years ago

Collections that are live but not viewable (missing) on T&T:

910 - Brexit 2456 - Credit crunch 629 - District councils 469 - Easter rising 689 - Scottish elections 2016 3866 - Duke of edinburgh 331 - Family history 370 - Forth bridge 990 - FTSE 100 9 - Health and social 520 - IT Collection 65 - Scottish Ind 3064 - Startup 851 - Queens birthday 2016 60 - UK Gen election 2015 283 - UK response, typhoon 3098 - UK Retail 2778 - Unfinished business 471 - VE day

nicolabingham commented 2 years ago

@jasonwebber-bl I tried to tag UK General Election 2015 into a higher level subject category but it is already checked as belonging to 'Politics'

nicolabingham commented 2 years ago

I have unchecked and checked the collection as belonging to 'Politics'. Let's review tomorrow to see if it appears on the UI

anjackson commented 2 years ago

Hmm, looking at the Solr database, it seems like the UK General Election 2015 collection is there and the data looks right, i.e. it looks like the other entries....

...
      {
        "id":"2798",
        "type":"collection",
        "name":"UK General Election 2019",
        "description":"A collection of websites representing the 2019 UK General Election. ",
        "collectionAreaId":[2941]},
      {
        "id":"60",
        "type":"collection",
        "name":"UK General Election 2015",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, focussing on the 2015 UK General Election which was held on 7 May 2015 to elect 650 members to the House of Commons. It was the first general election at the end of a fixed-term Parliament. \n",
        "collectionAreaId":[2941]},
      {
        "id":"2453",
        "type":"collection",
        "name":"UK General Election 2005",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, archived during and immediately after the UK general election campaign of 2005. The collection comprises a sample of candidate’s campaign sites and weblogs, local and national party sites, opinion polls, news and commentary, and the manifestos of a range of interest groups.",
        "collectionAreaId":[2941]},
      {
        "id":"1233",
        "type":"collection",
        "name":"UK General Election 2017",
        "description":"Collection of websites, curated by staff at the Legal Deposit Libraries, focussing on the United Kingdom general election of 2017 which took place on Thursday 8 June. Under the Fixed-term Parliaments Act 2011 an election had not been due until 7 May 2020, but a call by Prime Minister Theresa May for a snap election was ratified by the necessary supermajority in a 522-13 vote in the House of Commons on 19 April 2017.",
        "collectionAreaId":[2941]}
...

So it seems maybe this is an issue with how Solr/UKWA-UI is generating/processing the collection hierarchy.

min2ha commented 2 years ago

so we have a data flow chain: ACT DB -> middleware -> SOLR -> UI

it's about a time to check middleware then.

(Reminder: Collections' data organisation in ACT DB is hierarchical (i.e. tree structure: Collection in Collection etc.). Areas point to lists of top collections only. Top Collection may be assigned to more than one area)

jasonwebber-bl commented 2 years ago

Just to confirm: UK Gen Election 2015 (the collection that Nicola unticked and ticked again yesterday) didn't appear today.

anjackson commented 2 years ago

@min2ha as per https://github.com/ukwa/w3act/issues/676#issuecomment-1149929700 as far as I can tell, the data in Solr looks right. I think the issue is in UKWA-UI.

min2ha commented 2 years ago

@min2ha as per #676 (comment) as far as I can tell, the data in Solr looks right. I think the issue is in UKWA-UI.

Thanks Andy! I'll check against full SOLR Collection data then, not data for testing only.

anjackson commented 2 years ago

No worries @min2ha - I wrote it up as https://github.com/ukwa/ukwa-ui/issues/353