Open brendan-oconnell opened 2 months ago
RFC @brendan-oconnell @rhigman
Reduce the list of work_status
to ACTIVE
and INACTIVE
(both for post-publication), and FORTHCOMING
, CANCELLED
and POSTPONED_INDEFINITELY
(for pre-publication)
Setting the following constraints on dates
publication_date
ACTIVE
INACTIVE
FORTHCOMING
POSTPONED_INDEFINITELY
CANCELLED
withdrawn_date
INACTIVE
ACTIVE
POSTPONED_INDEFINITELY
CANCELLED
FORTHCOMING
Get rid of the following codes completely as they're either redundant or not relevant. When set, replace them with INACTIVE
UNSPECIFIED
OUT_OF_STOCK_INDEFINITELY
OUT_OF_PRINT
WITHDRAWN_FROM_SALE
NO_LONGER_OUR_PRODUCT
UNKNOWN
REMAINDERED
RECALLED
@ja573 I agree with deprecating the codes you mention. The impact would be minimal; currently in Thoth, there are 2 Works WITHDRAWN_FROM_SALE
, 13 that are OUT_OF_PRINT
, and the rest of the codes have no Works associated with them.
I still wonder about requiring publication_date
for FORTHCOMING
works... I really don't know enough about how publishers use Thoth to know if this would pose a problem for them. I suppose it would always be possible to set a dummy publication_date
, then the publisher hopefully updates it if necessary when they change the work_status
to ACTIVE
.
Might interact with #585.
Not sure about publication_date
being mandatory for FORTHCOMING
. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage, FORTHCOMING
is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.
(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)
One more piece of data:
Currently 178 FORTHCOMING
Works in Thoth with no publication_date
vs. 12 that have a publication_date
One more piece of data: Currently 178
FORTHCOMING
Works in Thoth with nopublication_date
vs. 12 that have apublication_date
Good point - setting the default value in the migration would be a can of worms.
Yeah, I'm also divided about FORTHCOMING
... The reason for proposing it was that technically, a date is required by ONIX, which is understood as the expected publication date. But yes – we should avoid dummy dates
Yeah, I'm also divided about
FORTHCOMING
... The reason for proposing it was that technically, a date is required by ONIX, which is understood as the expected publication date. But yes – we should avoid dummy dates
Yes, and it would definitely become an issue if we did start regularly disseminating ONIX records prior to publication. (Although, in that case, the individual exports should be preventing the creation of ONIX files where the platform requires a publication date and the record doesn't have it.)
Do we currently disseminate ONIX records prior to publication, and if not, is that something that's important to Thoth users?
Might interact with #585.
Not sure about
publication_date
being mandatory forFORTHCOMING
. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage,FORTHCOMING
is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)
early May? I'm out next week, but will continue working on this when I get back.
No, at least for now we'll only distribute post-publication. But (a) we might do in the future (e.g. if we integrate platforms like lightning source) and (b) we don't know how people might use the records we output (we may not be pushing them, but people might be harvesting them).
I think at present the only ONIX output you can generate pre-publication is Thoth's – and since it's meant to be the full implementation of ONIX, we should enforce having a publication_date set for forthcoming books
Might interact with #585. Not sure about
publication_date
being mandatory forFORTHCOMING
. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage,FORTHCOMING
is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion. (ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)early May? I'm out next week, but will continue working on this when I get back.
Sorry, I don't follow...
Might interact with #585. Not sure about
publication_date
being mandatory forFORTHCOMING
. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage,FORTHCOMING
is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion. (ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)early May? I'm out next week, but will continue working on this when I get back.
Sorry, I don't follow...
It seems like it's me who didn't follow what ETA meant in this context... I thought you meant "estimated time of arrival" :)
It seems like it's me who didn't follow what ETA meant in this context... I thought you meant "estimated time of arrival" :)
Ah, my fault! I was using it as "edited to add" - just to acknowledge that you made a very similar point in the time it took me to post mine :smile: - should have avoided that ambiguity!
I think at present the only ONIX output you can generate pre-publication is Thoth's – and since it's meant to be the full implementation of ONIX, we should enforce having a publication_date set for forthcoming books
Hmm, actually, the current implementation of onix::thoth
is very permissive in terms of still letting you output something even if the record is incomplete. During development, I'd been thinking of it more as a way to get one's entire record "out" of Thoth in a familiar/standard format. (All of the high-level mandatory ONIX fields are already mandatory within Thoth, so that's not a concern, but identifying this kind of interaction between fields would have required a lot of close-reading.) Not that we can't change it.
In practice, I think you can output all the other ONIX flavours pre-publication except for Google Books and Overdrive - those are the only ones which explicitly mandate a publication date.
One more piece of data: Currently 178
FORTHCOMING
Works in Thoth with nopublication_date
vs. 12 that have apublication_date
Odd - I make it 79 vs 21.
And 14 of those 21 dates are in the past!
One more piece of data: Currently 178
FORTHCOMING
Works in Thoth with nopublication_date
vs. 12 that have apublication_date
Odd - I make it 79 vs 21.
And 14 of those 21 dates are in the past!
My fault... I looked at my (out-of-date) development data dump, instead of the production database! It seems like the proportion of FORTHCOMING
works with publication_date
vs. none from my figures somewhat hold though.
I also did notice a lot of Forthcoming
dates in the past...
OK, so to sum up, this kind of gets to a tension between Thoth-as-metadata-management system vs. -dissemination system.
As @rhigman notes above, publishers using Thoth as a metadata management system seem to want a kind of 'draft' record state, and they also seem to be currently using FORTHCOMING for this, as indicated by the relatively large number of FORTHCOMING works with no publication_date. If we create a catchall INACTIVE
status as @ja573 has proposed, they could use this for 'drafts' of any kind, although the term "Inactive" has a different definition in ONIX Codes for Publishing Status: "The product was active, but is now permanently or indefinitely inactive in the sense that the publisher will not accept orders for it, though stock may still be available elsewhere in the supply chain." I'm not sure how important/well-known those ONIX Codes are to publishers?
So this would seem to be an argument for making publication_date
optional for FORTHCOMING
.
On the other hand, we want Thoth as a dissemination system to be able to disseminate successfully as much as possible, and not requiring FORTHCOMING
works to have a publication_date
would prevent some ONIX outputs, some of the time.
What's the best way to proceed with this decision? I have the least experience and domain-specific knowledge of anyone on this project, so I don't want to make the decision myself :) Do we need to discuss at a future Thoth meeting? In any case, it seems like this question of how publishers are creating 'drafts' in Thoth is worth digging into further...
Based on those five statuses, the ideal usage would be that books start as FORTHCOMING
and then follow:
graph TD;
FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
FORTHCOMING -->|Cancelled| CANCELLED;
FORTHCOMING -->|Published| ACTIVE;
ACTIVE -.->|Withdrawn| INACTIVE;
Then, if we agree on reducing the status to just those 5, we need to look at what constraints ONIX has between these statuses and other fields and implement them accordingly
My view on this topic: since 2020 at SciELO Books we have started working with books that will be released. This means that the publication date is usually in the future and ONIX is sent in advance to Kobo, Amazon and Google so that the book is available as a ‘pre-release’. As a result, the book is listed in the catalogues, but the files are only released on the day specified as the publication date. This also means that the entire set of metadata is prepared beforehand, but without the release date. The date is entered when it is set by the publisher and then the metadata is exported in ONIX.
Publishers using Thoth as a metadata management tool will have statuses FORTHCOMING (publication date NOT known) and FORTHCOMING (publication date know) which are still not resolved in that flow @ja573, and so the basic issue remains! If a publication date is 'required' for FORTHCOMING then publishers will be forced to input a made-up date - and if ONIX is then successfully distributed 'false' data is entered into various distribution systems (as well as Thoth) - in addition, it is unlikely that publishers will check if the inputted date has passed, again causing issues if distributed. So - if publication date is enforced for FORTHCOMING, then I think we need a different name for a status where the publication date has not been determined. And if FORTHCOMING does not require a publication date we need something which flag that the ONIX is not well formatted (as it is missing data) and/or prevent distribution of ONIX files to platforms that require a publication date. Presumably we will need to have a flag/hold when trying to distribute a Forthcoming work with a past publication date in any case - so I guess I would prefer to add a check for existence of a publication date at the same point rather than create a new work status.
Publishers using Thoth as a metadata management tool will have statuses FORTHCOMING (publication date NOT known) and FORTHCOMING (publication date know) which are still not resolved in that flow @ja573, and so the basic issue remains! If a publication date is 'required' for FORTHCOMING then publishers will be forced to input a made-up date - and if ONIX is then successfully distributed 'false' data is entered into various distribution systems (as well as Thoth) - in addition, it is unlikely that publishers will check if the inputted date has passed, again causing issues if distributed. So - if publication date is enforced for FORTHCOMING, then I think we need a different name for a status where the publication date has not been determined. And if FORTHCOMING does not require a publication date we need something which flag that the ONIX is not well formatted (as it is missing data) and/or prevent distribution of ONIX files to platforms that require a publication date. Presumably we will need to have a flag/hold when trying to distribute a Forthcoming work with a past publication date in any case - so I guess I would prefer to add a check for existence of a publication date at the same point rather than create a new work status.
The original idea was to required the publication date, but after the discussion it was clear that we should not be doing that, and just leave it to the onix output to complain about it not being set.
Those who choose to enter a publication date for forthcoming titles (which is already possible) would need to check that the date is to some extent accurate, as we don't currently have any mechanisms to check the veracity of data that's input. But because this date is meant to be an estimate anyway, I don't think it'll be a problem if it's not completely accurate.
At some point we could write notifications to publishers informing them of forthcoming books with dates in the past, though
Apologies, this may be slightly adjacent to the core discussion here - if I understood things correctly, we are also considering to make our metadata Crossmark-compliant (see also #582 ) ... Now, with regards to updates to Work Status, Crossmark categorisation of 12 different changes to a given Work Status might be relevant here as well (if we were to implement those): https://www.crossref.org/documentation/crossmark/participating-in-crossmark/#00279
graph TD;
FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
FORTHCOMING -->|Cancelled| CANCELLED;
FORTHCOMING -->|Published| ACTIVE;
ACTIVE -.->|Require removal| WITHDRAWN;
ACTIVE -.->|New edition| SUPERSEDED;
@ja573 Do you think publication_date
should be required for WITHDRAWN
and SUPERSEDED
works? On the one hand, withdrawn_date
will be required for these work_status,
and we only need one date for Crossmark (the date of the update
, whether it be a withdrawal, new edition, etc.). So for Crossmark purposes, it's not essential.
On the other hand, these are works that, according to the workflow you outline in your diagram, should have passed through an ACTIVE
state and have been published at some point, which would mean they would need to have a publication_date
when they're ACTIVE
. This would support requiring publication_date
, because it should (theoretically) always be present.
On the other, other hand though, if publishers are adding back catalog titles to Thoth, and they want to add works that have already been withdrawn or superseded in their catalog, perhaps they might not know the publication date... which would support not requiring it, to avoid them introducing false metadata into Thoth. And I know the general philosophy has been to keep required fields to a minimum.
What do you think?
@ja573 Do you think
publication_date
should be required forWITHDRAWN
andSUPERSEDED
works? On the one hand,withdrawn_date
will be required for thesework_status,
and we only need one date for Crossmark (the date of theupdate
, whether it be a withdrawal, new edition, etc.). So for Crossmark purposes, it's not essential.On the other hand, these are works that, according to the workflow you outline in your diagram, should have passed through an
ACTIVE
state and have been published at some point, which would mean they would need to have apublication_date
when they'reACTIVE
. This would support requiringpublication_date
, because it should (theoretically) always be present.On the other, other hand though, if publishers are adding back catalog titles to Thoth, and they want to add works that have already been withdrawn or superseded in their catalog, perhaps they might not know the publication date... which would support not requiring it, to avoid them introducing false metadata into Thoth. And I know the general philosophy has been to keep required fields to a minimum.
What do you think?
This was discussed in a team meeting, and we decided to make publication_date
required for WITHDRAWN
and SUPERSEDED
works
If a
Work
in Thoth has awork_status
ofACTIVE
, requirepublication_date
.