Closed astrochun closed 7 months ago
For reference: there were 17 works published this year (2023) and we only show 5 in the recent works page. The sort is done by year so there is no easy way to determine which one is more recent of those 17 works.
For reference: there were 17 works published this year (2023) and we only show 5 in the recent works page. The sort is done by year so there is no easy way to determine which one is more recent of those 17 works.
Is there a published metadata date that we can use?
We can use either created_at or updated_at, which exist on the database record but not in the Datacite record. For the referenced data set, these are:
created_at: Fri, 17 Nov 2023 14:10:06.509405000 EST -05:00,
updated_at: Fri, 08 Dec 2023 11:41:34.649789000 EST -05:00,
I'm going to index them both so we can try out which of these works better in the UI.
Great. I suspect created date makes the most sense
Done
@astrochun Asked us to look into why the dataset he mentioned isn't on the front page.
The dataset in question is this one: https://datacommons.princeton.edu/describe/works/201
Its created_at
date is 17 Nov 2023.
On 18 Jan 2024, the list of Recently Added datasets on the front page of PDC Discovery production have these dates for when they were added to PDC Describe:
["2024-01-08T11:51:38Z",
"2023-12-22T14:00:38Z",
"2023-12-22T13:47:37Z",
"2023-12-22T13:43:37Z",
"2023-12-22T13:35:55Z",
"2023-12-22T13:26:33Z",
"2023-12-22T13:20:44Z",
"2023-12-22T12:37:29Z",
"2023-12-22T12:24:52Z",
"2023-12-22T12:13:31Z"]
All of which are more recent than 17 Nov 2023. Do we instead need some combination of "publication date" and "created_at"?
All of which are more recent than 17 Nov 2023. Do we instead need some combination of "publication date" and "date added"?
@bess perhaps. The issue is that we did not migrate the data in chronological order and had to publish new datasets. A lot of those aren't recent datasets but from a few years back. I know this is a challenging one to fix since the metadata is a bit limited. If we can filter out those that have a publication date on or before 2022, that should capture more of the recent datasets. I think once we have more datasets, this will resolve itself.
Another idea: Maybe we exclude anything that was migrated?
Here's my two cents: "recently published" should sort in reverse-chronological order by the date of first issue (not update/edit, and not migration); and once we get past the migration phase, I don't expect much confusion about what recently went into PDC vs. what was recently published for the first time.
I talked to @matthewjchandler on slack and after discussion he now agrees we should sort by the pdc created_at
timestamp (since date of issue is not granular enough to do meaningful sorting) but exclude migrated works.
UPDATE: Sounds like we don't yet have quite the right definition of what should be on the Recently Published feed. Working with @astrochun and @matthewjchandler to figure out what that should be.
On Friday, December 8, we published a new dataset in PDC:
However, on the main Discovery page this does not show up under "Recently published".
@hectorcorrea points out that the sorting is done by perhaps year, so it's not capturing the proper order between each dataset.
Acceptance criteria
created_at
timestamp from PDC Describe