inveniosoftware-contrib / invenio-record-editor

Editor for Invenio records
GNU General Public License v2.0
0 stars 4 forks source link

Validators to implement #9

Open jmartinm opened 7 years ago

jmartinm commented 7 years ago

Here we can keep track of all the complex validators (dependent on INSPIRE data model) that can be implemented and that should issue either warnings or errors.

The validation might be implemented on the Python side and the editor will just call the URL to get the validation warnings.

Errors

Warnings

jmartinm commented 7 years ago

@inveniosoftware-contrib/inspire-content we are ready to implement this. Can we complete a list of errors and warnings that should be triggered in Literature records?

annetteholtkamp commented 7 years ago

Warning:

Error:

michamos commented 7 years ago

Many fields in the schema are conditional on the document_type, so an error should be raised if the document_type is not present but the other fields are. Maybe it makes more sense to have the document_type as the first entry of the record editor, and have default fields that depend on this choice, or group fields by document_type and have a way to hide them (like on the submission form).

Errors

Warnings

aw-bib commented 7 years ago

Note that there are a bunch of formally invalid ISBNs around. Most likely if it's not 10 or 13 digits and invalid it should be a warning only. (BTW: do you set ISBNs for chapters in books, if any?)

One might (should?) be tempted to hook up ISBN entry with a call against a catalogue and try to import data if not there already. (@jmartinm should have a piece of code for GVK import.)

ISSN would be a valuable field that may be present for articles and book series.

michamos commented 7 years ago

Note that there are a bunch of formally invalid ISBNs around. Most likely if it's not 10 or 13 digits and invalid it should be a warning only.

What are invalid ISBNs useful for? if they are invalid, we know that no actual book can possibly correspond to it.

(BTW: do you set ISBNs for chapters in books, if any?)

For chapters in books, we have parent_isbn, which is a different field.

aw-bib commented 7 years ago

What are invalid ISBNs useful for?

I'd like to refer this question to the publishing industry...

if they are invalid, we know that no actual book can possibly correspond to it.

I fear this assumption does not hold. There exist formally wrong ISBNs (eg. wrong checksum) that are used and generated by publishers just like valid ISBNs. Sometimes they are corrected sometimes not. Nevertheless, usually they are printed into the book and thus exist in hardware. They are common enough that they made it into the standard. Marc 020 subfield $z.

Side note: quite some multi volume books share the same ISBN, ie. you can not assume that the ISBN has to be unique either, even if you merge subsequent editions of the same book to one record (effectivley getting rid of a number of these issues) this may be relevant if you have the individual entries for the volumes. Additionally, if you strip the - you may have two "identical" numbers that are not the same. (Chances are small for INSPIRE, however. Usually those dupes result form very small publishers. It's a quite common problem for poetry...)

kaplun commented 7 years ago

Yeah INSPIRE has very little book, and we can give a run to ensure that the current 2K isbns we know are valid. Then the chances of having an important HEP book with a broken ISBN becomes really small.

michamos commented 7 years ago

Turns out we have 36 records with invalid ISBNs on INSPIRE (when stripping some extra crap in the $$a field that does not belong there). I created an Asana task for this.

kaplun commented 7 years ago

Actually I think, going field by field we can come up with tons of warnings. E.g.:

aw-bib commented 7 years ago

@kaplun if you have a working or even good checker for "valid latex" could you drop me a note by pm? TIA.

jmartinm commented 7 years ago

List updated

michamos commented 7 years ago

Warning

Error