Closed christinklez closed 1 month ago
I recreated 3 UCB EDA objects, all of which are complex objects that are not displaying in the intended Nuxeo order. (Stage has a different order.)
Initial upload sequence: Upload order to match how it appears on -stage.
"Fixed" (to match original Nuxeo record) sequence: Reordered to match the Nuxeo record.
Note that my hunch is UCB EDA created these objects by uploading via the File Uploader client. I'm pretty certain of this given that the component objects filepath all have the original filenames (set by the very initial document title).
Initial upload sequence: This is how the File Uploader uploaded.
Renamed the component objects (still in uploaded sequence):
"Fixed" (to match original Nuxeo record) sequence:
@christinklez I tried fetching the metadata for your recreated test Eckbo object, and the children are in order!!
@amywieliczka noticed that for problematic objects, the order of the child components in the left sidebar in nuxeo doesn't match the order on the main page for the parent object, by the way. As we know, there are issues with ordering in Nuxeo...I'm not sure what method in particular is causing the problem.
It turns out that the @search
nuxeo API endpoint returns component objects in the wrong order sometimes. The search/lang/NXQL/execute
endpoint described here returns them in the correct order: https://doc.nuxeo.com/nxdoc/search-endpoints/#searching-by-query
This PR updates the API endpoint for the query that gets the child component objects: https://github.com/ucldc/rikolti/pull/994
Once it's deployed, we'll need to reharvest Nuxeo collections with the component ordering issue. @christinklez I don't suppose you have an exhaustive list of which collections are affected? I'm sure it's really hard to figure out just by looking at the website. I could write a script to compare the ordering for all collections...
OpenSearch query for Nuxeo collections with complex objects:
GET rikolti-stg/_search
{
"query": {
"nested": {
"path": "children",
"query": {
"match_all": {}
}
}
},
"aggs": {
"collection_ids": {
"terms": {
"field": "collection_url",
"size": 10000
}
}
},
"size": 0
}
28197 reharvest results: The ordering is significantly better! But there were two objects that were not in the correct sequence.
Items 7 & 8 ordering (on -stage) is swapped: https://calisphere-stage.cdlib.org/item/06e49f86-fd49-43d6-82b9-14f07094eada/?order=6
I went into the Nuxeo record and clicked "Edit" and then "Save" (without making any changes). I also clicked the "refresh" icon on the upper right of the component object listing. The display order updated, for the component objects. This order matches what is coming through on -stage (as well as the vernacular fetched metadata).
28199 reharvest results:
Cole, Fraser record sequence matches Nuxeo: https://calisphere-stage.cdlib.org/item/cf56d5e9-b688-42f7-981e-32c551783183/
Eckbo, Garrett (ALCOA Forecast) record sequence matches Nuxeo: https://calisphere-stage.cdlib.org/item/17eef353-3469-45a3-a64d-973e982dda87/
Just Zoomed with @barbarahui about this, but we are witnessing different component object sequencing in the Nuxeo UI depending on the Nuxeo record URL...
(PS: My "edit" and "save" attempt may be a red herring. There may be something to "how you get to a record" that may affect the display order?)
Nuxeo JSF permalink: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/06e49f86-fd49-43d6-82b9-14f07094eada/view_documents
Nuxeo JSF not permalink: https://nuxeo.cdlib.org/nuxeo/nxpath/default/asset-library/UCB/UCB%20EDA/Blake%20Estate%20Collection/8665060857627642113@view_documents?tabIds=%3A&conversationId=0NXMAIN4
Web UI not permalink: https://nuxeo.cdlib.org/nuxeo/ui/#!/browse/asset-library/UCB/UCB%20EDA/Blake%20Estate%20Collection/8665060857627642113
Web UI permalink: https://nuxeo.cdlib.org/nuxeo/ui/#!/doc/06e49f86-fd49-43d6-82b9-14f07094eada
I sample QA'd three more UCB EDA collections:
Each of these collections have at least one record with component objects that do not match the Nuxeo sequence. I reharvested all three collections (with the updated Nuxeo endpoint) and the Nuxeo sequence now matches the -stage sequence.
These are screen captures from before the reharvest (i.e., before the Nuxeo endpoint update), which show that the Nuxeo sequence did not match the -stage sequence:
Farrand: https://calisphere-stage.cdlib.org/item/56402aab-45a6-4f64-8a48-56c4cb6d6a3c/
Royston: https://calisphere-stage.cdlib.org/item/ec8f5fbd-b518-493a-b7e7-89c7ea732db6/
Church: https://calisphere-stage.cdlib.org/item/2f1664a4-d05b-4f66-bb9d-8e575206bab9/
I filed a Nuxeo JIRA ticket for this: https://jira.nuxeo.com/browse/SUPNXP-51504
JSON report containing information on the 187 items (across 25 collections) where we think the order of the complex object components on calisphere-stage doesn't match what is in the Nuxeo prod UI: https://drive.google.com/drive/folders/1IMknt9UxHxZRduDJ6ILrIJUCh2H5fKwz
Collection ID Count of complex objs with ordering problem
27809 1
27124 13
26883 45
26466 2
26887 1
26771 5
65 5
26864 1
27569 16
26713 54
26147 1
27598 16
28203 2
26677 1
26945 5
1324 1
26895 4
27594 2
28040 3
15586 3
18409 1
25889 2
28197 1
472 1
4908 1
I did some more digging, and not all of the problematic objects are missing pos
(position) in the database. A couple of different examples I found when spot checking:
Nuxeo: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/8610b0c8-f3f7-4978-b264-f10b5dddd112/view_documents Calisphere stage: https://calisphere-stage.cdlib.org/item/8610b0c8-f3f7-4978-b264-f10b5dddd112
The issue with this object is that there are grandchildren nested inside a couple of the children. Child 1987_0222_UCI_Choir_1of2
has an object nested inside of it. Child 1987_0222_UCI_Choir_2of2
does as well. The old fetcher was picking up these nested grandchildren and so they are displayed in calisphere mixed in with the children. Since I modified the fetcher to fix https://github.com/ucldc/rikolti/issues/1000 it will no longer fetch grandchildren, only direct children of the parent object. Is this correct?
Nuxeo: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/097c0984-4e1f-43b5-b39b-101558ae3921/view_documents Calisphere stage: https://calisphere-stage.cdlib.org/item/ark:/21198/n1j60r/
Pages 240 and 241 are reversed in Nuxeo:
The are not reversed on calisphere-stage. These objects have pos
values in the database, so I don't understand why the fetcher is getting them in a different order from what's in Nuxeo.
[UPDATE] It looks like this particular query and endpoint (which the fetcher was using before we made a couple of changes), returns page 240 and then 241:
Query:
Select * from document where ecm:path startswith '/asset-library/UCLA/clark/mss/09/uclaclark_ms1976010-2_Tiffs' AND ecm:isVersion = 0 AND ecm:mixinType != 'HiddenInNavigation' AND ecm:isTrashed = 0 ORDER BY ecm:pos ASC
Endpoint:
https://nuxeo.cdlib.org/Nuxeo/site/api/v1/path/@search
However, the query and endpoint we are currently using returns page 241 and then 240 (matching what is in Nuxeo):
Query:
Select * from document where ecm:parentId = '097c0984-4e1f-43b5-b39b-101558ae3921' AND ecm:isVersion = 0 AND ecm:mixinType != 'HiddenInNavigation' AND ecm:isTrashed = 0 ORDER BY ecm:pos ASC
Endpoint:
https://nuxeo.cdlib.org/Nuxeo/site/api/v1/search/lang/NXQL/execute
My head is going to explode 🤯
[MORE UPDATE] The 2 children that are coming back in an inconsistent order have the same pos
value of 430
in both the database and elasticsearch. This definitely seems like a bug. I'll file it with Nuxeo.
nuxeo-> WHERE parentid = '097c0984-4e1f-43b5-b39b-101558ae3921' and pos = 430;
id | parentid | pos | name | isproperty | primarytype | istrashed
--------------------------------------+--------------------------------------+-----+--------------------------------+------------+---------------------+-----------
7b0233ea-37c5-44f5-8df0-294ed633f7e3 | 097c0984-4e1f-43b5-b39b-101558ae3921 | 430 | uclaclark_ms1976010-2_0246.tif | f | SampleCustomPicture |
320d3025-c899-4ea2-83af-7023d0c62f2e | 097c0984-4e1f-43b5-b39b-101558ae3921 | 430 | uclaclark_ms1976010-2_0247.tif | f | SampleCustomPicture |
(2 rows)
I filed this as a separate JIRA issue: https://jira.nuxeo.com/browse/SUPNXP-51574
I did some testing to see if I could fill in pos
values for objects missing them via the UI. I was able to get the values populated, but not in the order that I would have expected. It might be easier to demonstrate on zoom, but here are the steps I followed:
pos
filled in. The fetcher now retrieves the objects in this order consistently.HOWEVER, the order is nothing like what was displayed in the UI before this. So I think that the best we can do to fix the objects with missing pos
is to programmatically order them alphabetically by title or by filename. Then users will have to manually reorder them if this is incorrect.
This would just be a fix for existing records that are missing pos
. We still need to replicate the workflow that results in complex object components to be created without pos
and fix it so that we don't get new objects with this problem going forward.
The secondary issue we discovered, where more than one child object has the same pos
value, is thankfully not that widespread. Here's the info on the 4 complex objects with the problem:
A system of ethicks. /By the Reverend Mr. Henry Grove. [Vol. 2] https://calisphere-stage.cdlib.org/item/ark:/21198/n1j60r/ https://nuxeo.cdlib.org/nuxeo/nxdoc/default/097c0984-4e1f-43b5-b39b-101558ae3921/view_documents Collection 26887 329 components total 2 components have pos 430
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/63edf544-4746-4211-adc7-1bf05edec202/view_documents https://calisphere-stage.cdlib.org/item/ark:/87280/t0np22c5/ Collection 26147 2 components total 2 components have pos = 0
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/f026c534-b844-46a3-ab70-e3637cf71e12/view_documents Not on calisphere 160 components total 2 components have pos 54 2 components have pos 98
https://nuxeo.cdlib.org/nuxeo/nxdoc/default/0e45044b-c08a-45ef-bf38-76f065878dd5/view_documents Not on calisphere 706 components total 305 pairs pos 1-305 are doubled
A list of the 1227 complex objects (by path) in Nuxeo whose children have no pos
in the database:
complex_obj_null_pos_paths.txt
This includes objects that aren't published to Calisphere.
A file with more complete info (uid, path, title) on the 1227 complex objects whose children have no pos
in the db:
@barbarahui sample object #13, with components that have metadata from nuxeo_spreadsheet import (following UCB EDA's method):
/asset-library/UCOP/aturner/orderingtest/Example 12.5962533403185838 https://nuxeo.cdlib.org/nuxeo/nxdoc/default/db065b59-8ef8-44e1-a902-1f1e58e988ad/view_documents
Summary of next steps:
Updated list of paths for complex objects that have the null pos
problem. Objects with only one component have been filtered out.
Tracking emails to campuses:
Spreadsheet with list of parent documents: https://docs.google.com/spreadsheets/d/1Pej1YCP6tB8nERkZx3vX1CbbpzaADivV2GQKE7hBvyw/edit?gid=0#gid=0
@barbarahui @aturner -- an update that we've received confirmation from all campus units that it's okay to run the positioning script for all Nuxeo complex objects currently missing position numbers.
@aturner -- the spreadsheet also includes indicators on which collections need to be reharvested. We can touch base on that later, after the positioning script has been run.
Thanks!!
@barbarahui -- before running the positioning script, would you be able to do one more run to identify documents that have component objects with missing positions?
Jason at UCB EDA went ahead and touched all of the objects and is currently happy with the current position order of their collections, and provided approval to publish these to Calisphere. I let him know that we'll do a double check on these collections to check if he missed any. Since he's currently happy with the component object ordering, he doesn't want them to be arranged by filename (if they don't have positions) and would prefer to use the "move" functions to trigger the numbering.
Thank you!
cc: @aturner
https://help.oac.cdlib.org/a/tickets/142259 (reminder to self, to send updates to Jason through this thread)
@christinklez I'm attaching a list of all of the documents with missing positions. They're ordered alphabetically, so UCB's are at the top.
New tab in this spreadsheet: https://docs.google.com/spreadsheets/d/1Pej1YCP6tB8nERkZx3vX1CbbpzaADivV2GQKE7hBvyw/edit?gid=1496331430#gid=1496331430 -- for UCB EDA documents only.
There were 12 objects (from their newest most recently harvested/published collections) that would be impacted by the positioning script. I've messaged Jason about those 12 objects. https://help.oac.cdlib.org/a/tickets/142259
Got the okay from UCB EDA to go ahead and run the position number script on their objects as well! Please feel free to run the position numbering script. Thank you!
I ran the script to assign an order value to all of the complex object components in Nuxeo that were missing them.
Summary: Updated 36459 children of 1196 objects
Note: the number of objects is higher than what was in complex_obj_no_order_paths_2024-09-16T18:21:24.PDT.txt
because that report only lists parents with more than one component.
PR: https://github.com/ucldc/nuxeo-component-ordering/pull/1
I'm attaching a json file containing data on the updates, just in case we need to refer back to it at any point: null_order_fix_report_2024-09-27T16_21_02.PDT.json
Expect to see the Nuxeo component object sort order being retained in the Calisphere item viewer.
Nuxeo fetcher code: https://github.com/ucldc/rikolti/blob/main/metadata_fetcher/fetchers/nuxeo_fetcher.py#L91
Nuxeo
Nuxeo object: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/06e49f86-fd49-43d6-82b9-14f07094eada/view_documents
Expected ordering of component objects:
Calisphere-stage
Calisphere-stage object: https://calisphere-stage.cdlib.org/item/06e49f86-fd49-43d6-82b9-14f07094eada/?order=0
Order of records in the vernacular metadata: