Closed jrochkind closed 5 years ago
"Recently added" is intended to sort by _createdat, descending.
Config on kithe: field.sort = "date_created_dtsi desc"
Config on chf_sufia: config.add_sort_field "#{uploaded_field} desc", label: "recently added"
uploaded_field
is system_create_dtsi
This may actually explain the hwole thing -- the sufia app has a created
date and an uploaded
date. We weren't sure what the difference was and didn't see any reason the distinction mattered, so we only copied over "created".
But chf_sufia is sorting by the uploaded
date which may be slightly different than created! While kithe is sorting by the created
date that was copied over. So they may be slightly different.
created_at
in New Thing comes from date_uploaded
in Sufia, not from create_date
.
Take for instance https://digital.sciencehistory.org/works/bk128c180, which has different values for date_uploaded
and create_date
.
GenericWork.find('bk128c180').date_uploaded.utc
=> 2018-11-28 18:57:51 UTC
GenericWork.find('bk128c180').create_date.utc
=> 2019-10-29 14:24:02 UTC
In the new app, create_date
is discarded and date_uploaded
becomes created_at
.
Work.find_by_friendlier_id('bk128c180').created_at.utc
=> 2018-11-28 18:57:51 UTC
OK, here's what appears to be going on, based on an examination of the top three items on chf_sufia recently_added
sort.
date_uploaded
, created_date
, modified_date.
date_uploaded
is 11 months earlier than created_date
. created_date
, not date_uploaded
. date_uploaded
and copies it to created_a
, it ignores (and does not migrate) the created_date
. That is, it migrates date_uploaded
to created_at
, and does not migrate created_date
at all. So scihist_digicoll for "recently added" sorts by created_at
, which corresponds to date_uploaded
in chf_sufia -- the two apps are sorting by different values. Unclear why the date_uploaded can be earlier, and so much earlier, in sufia, than date_created. Unclear if we made the right choice in what to migrate -- although migrating the earlier date makes sense.
It may be that chf_sufia was just weird and unexplainable here, and we have done our best to migrate based on that, and it's good enough.
Top 3 items on old production chf_sufia, with relevant lifecycle dates from both systems:
bk128c180
2b88qd40r
2b88qd390
In at least some cases in chf_sufia, date_created
appears to be empty! Eg qj72p712h
and pn89d6745
.
That may be why we chose to migrate date_uploaded
to created_at
.
It may be that the sort is actually broken or hard to predict in chf_sufia.
It also may be that the sufia switched from using date_created
to date_uploaded
at some point, or switched it's logic around this stuff?
It may be that for child works in sufia, the date_uploaded
is copied over from parent, which is why for child works date_uploaded
can be so much earlier than create_date
-- and not really accurate for when that record was created, it's kind of wrong, that child work was not uploaded then.
This may not be the way it worked always historically in the sufia app though, it may have changed at different times.
Still not sure the difference between create_date and date_uploaded in sufia, and why create_date
is blank for some records. It's pretty hard to investigate sufia on these things, with no easy way to query and confusing code.
OK, some info from slack.
create_date
and modified_date
are managed by fedora. (I think I was wrong that create_date
is sometimes nil). Fedora will set them automatically on new record creation or record update. date_uploaded
is managed by sufia(/hyrax), and is intended to be the "date record was created".
create_date
to time of import, but uploaded_date
should be the real/original creation date. date_created
is app-managed not fedora-managed, and actually is not a lifecycle data, but descriptive metadata. We don't use it (it's the one that's blank in maybe ALL of our records), we use date_of_work
instead (to be less confusing, hooray). date_modified
which is an app-managed equivalent to fedora-managed modified_date
. It should be within seconds or less of modified_date
-- and the samples I've tested, it is, so can be ignored as a separate thing I think. Still don't really have an answer for why date_uploaded
is 11 months earlier than create_date
in some cases. May have to look at sufia
/curation_concerns
source code.
@yonyitz and @HKativa Do you have a sense of when you really first created this record: https://digital.sciencehistory.org/works/bk128c180? ("Diagrams of microscope and assorted optical instruments and devices", Part of "Micrographia...")
Do you have any sense of if it was really first created 29 Oct 2019 (last week?), or 28 Nov 2018 (last year)?
@yonyitz and @HKativa another question -- did you think you used 'promote to child work' on that one recently, perhaps on 29 Oct 2019?
@jrochkind yes, that is exactly what happened here. I finished cataloging the 18 remaining plates from the Micrographia last Monday and Tuesday (10/28 and 10/29). Those are the 18 records you see above "Oral history interview with Carlyle B. Storm" when sorting by "recently added" in chf_sufia: https://digital.sciencehistory.org/catalog?utf8=%E2%9C%93&search_field=all_fields&q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=
Aha, thanks.
OK, here is my current hypothesis:
Using “promote to child work” (a custom function in our chf_sufia app) ends up copying over the uploaded_date. It's uploaded_date
(which is the one we consider "date of creation" in new app_ stays the same, despite promotion.
chf_sufia may have been wrong about what field it choose to use for sorting by "recently added". It probably should have been sorting by uploaded_date
, but was not. It may have inherited this mistake from sufia itself. Because all these dates are really confusing to lots of people involved.
So the sufia sort may itself have been buggy. Separately, there may be things we want to tweak in how the dates are imported.
And there are some decisions involved about what is "right" for the "date created" for something converted from asset to work or vice versa -- should it's "date created" stay the same or reset? In terms of administrative metadata for historical/preservation/administrative purposes, as well as UI.
(In terms of UI, this is hacky currently and can change -- Sorting by "recently added", at least for non-logged-in-users, should probably really give you sort by date it was published, but we don't currently track that, but could in the future).
while create_date is the date it was promoted to child work. I think this was not entirely on purpose. I am not sure if it should be considered ‘right’ or ‘wrong’. I am not sure if new app behavior matches (or if that matters. :slightly_smiling_face: ).
We've decided we undertstand the situation, and have no need to rescue the old date variation from migrated data, the way it migrated is fine.
Note: This cluster of problems is also discussed in https://github.com/sciencehistory/scihist_digicoll/issues/446 . This ticket is the more helpful of the two.
When not logged in, searching unrestriccted sort by 'updated_at', we get different results in chf_sufia and new app.
https://kithe.sciencehistory.org/catalog?q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=&search_field=all_fields&sort=recently_added
https://digital.sciencehistory.org/catalog?q=&range%5Byear_facet_isim%5D%5Bbegin%5D=&range%5Byear_facet_isim%5D%5Bend%5D=&search_field=all_fields&sort=system_create_dtsi+desc
This may not mean that the
created_at
values don't match, rather it may be a bug of some other kind (in either app!).But if it did mean the
created_at
didn't match, that might postpone production launch. Just a bug in sorting (on either app) probably would not.We are going to look into it and try to see what's going on!