sciencehistory / scihist_digicoll

Science History Institute Digital Collections
Other
13 stars 0 forks source link

Why don't "recently updated" sorts match? #463

Closed jrochkind closed 5 years ago

jrochkind commented 5 years ago

When not logged in, searching unrestriccted sort by 'updated_at', we get different results in chf_sufia and new app.

https://kithe.sciencehistory.org/catalog?q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=&search_field=all_fields&sort=recently_added

https://digital.sciencehistory.org/catalog?q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=&search_field=all_fields&sort=system_create_dtsi+desc

This may not mean that the created_at values don't match, rather it may be a bug of some other kind (in either app!).

But if it did mean the created_at didn't match, that might postpone production launch. Just a bug in sorting (on either app) probably would not.

We are going to look into it and try to see what's going on!

jrochkind commented 5 years ago

"Recently added" is intended to sort by _createdat, descending.

Config on kithe: field.sort = "date_created_dtsi desc"

Config on chf_sufia: config.add_sort_field "#{uploaded_field} desc", label: "recently added"

This may actually explain the hwole thing -- the sufia app has a created date and an uploaded date. We weren't sure what the difference was and didn't see any reason the distinction mattered, so we only copied over "created".

But chf_sufia is sorting by the uploaded date which may be slightly different than created! While kithe is sorting by the created date that was copied over. So they may be slightly different.

eddierubeiz commented 5 years ago

created_at in New Thing comes from date_uploaded in Sufia, not from create_date.

Take for instance https://digital.sciencehistory.org/works/bk128c180, which has different values for date_uploaded and create_date.

GenericWork.find('bk128c180').date_uploaded.utc
=> 2018-11-28 18:57:51 UTC
GenericWork.find('bk128c180').create_date.utc
=> 2019-10-29 14:24:02 UTC

In the new app, create_date is discarded and date_uploaded becomes created_at.

Work.find_by_friendlier_id('bk128c180').created_at.utc
=> 2018-11-28 18:57:51 UTC
jrochkind commented 5 years ago

OK, here's what appears to be going on, based on an examination of the top three items on chf_sufia recently_added sort.

Unclear why the date_uploaded can be earlier, and so much earlier, in sufia, than date_created. Unclear if we made the right choice in what to migrate -- although migrating the earlier date makes sense.

It may be that chf_sufia was just weird and unexplainable here, and we have done our best to migrate based on that, and it's good enough.

Top 3 items on old production chf_sufia, with relevant lifecycle dates from both systems:

jrochkind commented 5 years ago

In at least some cases in chf_sufia, date_created appears to be empty! Eg qj72p712h and pn89d6745.

That may be why we chose to migrate date_uploaded to created_at.

It may be that the sort is actually broken or hard to predict in chf_sufia.

It also may be that the sufia switched from using date_created to date_uploaded at some point, or switched it's logic around this stuff?

jrochkind commented 5 years ago

It may be that for child works in sufia, the date_uploaded is copied over from parent, which is why for child works date_uploaded can be so much earlier than create_date -- and not really accurate for when that record was created, it's kind of wrong, that child work was not uploaded then.

This may not be the way it worked always historically in the sufia app though, it may have changed at different times.

Still not sure the difference between create_date and date_uploaded in sufia, and why create_date is blank for some records. It's pretty hard to investigate sufia on these things, with no easy way to query and confusing code.

jrochkind commented 5 years ago

OK, some info from slack.

Still don't really have an answer for why date_uploaded is 11 months earlier than create_date in some cases. May have to look at sufia/curation_concerns source code.

jrochkind commented 5 years ago

@yonyitz and @HKativa Do you have a sense of when you really first created this record: https://digital.sciencehistory.org/works/bk128c180? ("Diagrams of microscope and assorted optical instruments and devices", Part of "Micrographia...")

Do you have any sense of if it was really first created 29 Oct 2019 (last week?), or 28 Nov 2018 (last year)?

jrochkind commented 5 years ago

@yonyitz and @HKativa another question -- did you think you used 'promote to child work' on that one recently, perhaps on 29 Oct 2019?

HKativa commented 5 years ago

@jrochkind yes, that is exactly what happened here. I finished cataloging the 18 remaining plates from the Micrographia last Monday and Tuesday (10/28 and 10/29). Those are the 18 records you see above "Oral history interview with Carlyle B. Storm" when sorting by "recently added" in chf_sufia: https://digital.sciencehistory.org/catalog?utf8=%E2%9C%93&search_field=all_fields&q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=

jrochkind commented 5 years ago

Aha, thanks.

OK, here is my current hypothesis:

So the sufia sort may itself have been buggy. Separately, there may be things we want to tweak in how the dates are imported.

And there are some decisions involved about what is "right" for the "date created" for something converted from asset to work or vice versa -- should it's "date created" stay the same or reset? In terms of administrative metadata for historical/preservation/administrative purposes, as well as UI.

(In terms of UI, this is hacky currently and can change -- Sorting by "recently added", at least for non-logged-in-users, should probably really give you sort by date it was published, but we don't currently track that, but could in the future).

while create_date is the date it was promoted to child work. I think this was not entirely on purpose. I am not sure if it should be considered ‘right’ or ‘wrong’. I am not sure if new app behavior matches (or if that matters. :slightly_smiling_face: ).

jrochkind commented 5 years ago

We've decided we undertstand the situation, and have no need to rescue the old date variation from migrated data, the way it migrated is fine.

eddierubeiz commented 5 years ago

Note: This cluster of problems is also discussed in https://github.com/sciencehistory/scihist_digicoll/issues/446 . This ticket is the more helpful of the two.