Closed cverluise closed 2 years ago
@cverluise thanks for reporting this issue. I ignore the field journal_issn_l
when unmashaling the json file. Commit 768283990d02ff7ba355dc2a1ab4f41f8de97080
I didn't tested it. Could you please check whether the load works?
There might be other fields that needs to be changed, if you want to fix it on the fly, you could just add the missing field in the @JsonIgnoreProperties
annotation on top of the classes where the failure is occurring, in the case of Unpaywall would be UnpayWallMetadata
. Alternatively just reply here 😉
Hello @lfoppiano,
thanks for the quick answer.
I just pulled 7682839
However, I still get the same Exception raised.
Q: Should I recompile something so that the new property is properly taken into account?
Thanks !
@cverluise yes, you need to rebuild it
cd lookup
./gradlew clean build
Hello Cyril!
Just wondering, a new snapshot in August has not been announced on the Unpaywall discussion list afaik (latest for me is April), did you get it via another channel ?
Hello,
thanks!
I recompiled and it worked... until a new unrecognized field appeared, aka "has_repository_copy"
, "repository_institution"
(so far)
This is what my JsonIgnoreProperties
looks like at the moment
JsonIgnoreProperties({"z_authors", "x_reported_noncompliant_copies", "x_error", "journal_issn_l", "has_repository_copy", "repository_institution"})
What is strange is that adding "repository_institution"
and recompiling (./gradlew clean build
from lookup/
) did not solve the issue. Note that during building, I get (see full message below):
> Task :compileJava
Note: Some input files use unchecked or unsafe operations.
I still have ! com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "repository_institution" ...
.
Any Idea?
Thanks!
Hello Patrice!
Hello Cyril!
Just wondering, a new snapshot in August has not been announced on the Unpaywall discussion list afaik (latest for me is April), did you get it via another channel ?
I just filled the form on their website (here) and downloaded the file at the aws S3 adress sent back by Unpaywall. Is it non-standard ?
Thanks !
I just filled the form on their website (here) and downloaded the file at the aws S3 adress sent back by Unpaywall. Is it non-standard ?
Yes it's standard! For update, usually it was announced on the mailing list with the new S3 link, maybe they will do it in the next days. Having the new dataset would help to update the parser in biblio-glutton, because as we see they might be several other changes.
Dear @cverluise the error tells you also the class, in the last example the class is different: OALocation
:
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "repository_institution" (class com.scienceminer.lookup.data.OALocation)
The principle is the same but the class is different ;-)
@cverluise I pushed quickly a fix in 3e0943953faa3590829f35c0106fec83ecc1f96f
Have a look. I had not time to test it, sorry.
The documentation of the data schema has not been updated for the new snapshot apparently (see http://unpaywall.org/data-format), so we would need one or two examples to see what are the new fields and see what to do with them - ignoring them might not always be the right way to cope with them!
The documentation of the data schema has not been updated for the new snapshot apparently (see http://unpaywall.org/data-format), so we would need one or two examples to see what are the new fields and see what to do with them - ignoring them might not always be the right way to cope with them!
Some examples
issn_l
(new error ;) com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException) journal_issn_l
repository_institution
Working Lookup configuration with the August Unpaywall Snapshot Note: will be edited as errors occur
...
@JsonIgnoreProperties({"endpoint_id", "repository_institution"})
...
...
@JsonIgnoreProperties({"z_authors", "x_reported_noncompliant_copies", "x_error", "journal_issn_l", "has_repository_copy", "issn_l"})
...
The documentation of the data schema has not been updated for the new snapshot apparently (see http://unpaywall.org/data-format), so we would need one or two examples to see what are the new fields and see what to do with them - ignoring them might not always be the right way to cope with them!
I was planning to leave this issue open until the new field were integrated (if they rename something we would loose information). I revert back the change.
@kermitt2 I suggest that we make a stable release and we use master for development (I'm also fine to develop on a separate branch, though)
Hello @kermitt2 and @lfoppiano I'm facing the same problem described here, I filled the unpaywall form to get the link to download the snapshot
Hello @Aazhar ! Which version of the Unpaywall data dump?
after filling the form, I've got this dump : unpaywall_snapshot_2019-11-22T074546
gasp this is a new dump ! I didn't see it on the Unpaywall discussion group.
@lfoppiano I have the impression that the Jackson json marshalling is way too rigid, any unexpected/new attribute breaks the json parsing... while normally json is good for being schema less! maybe we should simply write a stupid json tree parser?
@Aazhar I pushed a quick fix with 6cece257be034c8334dacc47f34bfab1386aea6b to support this dump version
However, it will break again with the new dump for sure, because we can expect new json attributes continuously. Let's leave this issue opened until the json reader becomes robust.
great thanks @kermitt2
In the new version, unknown new json fields in the Unpaywall dump are by default ignored to avoid such issue.
I am closing this issue since we now allow new unknown fields in unpaywall to avoid this kind of regular ingestion breaking.
Hello,
first, thanks for the truly awesome work!
Issue
I am building the embedded LMDB database and was trying to add the Unpaywall LookUp.
The program starts but keeps raising exceptions
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "journal_issn_l"
(detailed error message below).How to reproduce the behaviour
Note: as you can see the Unpaywall dataset that I am using is more recent that the one used in the
biblio-glutton
demo.Environment
biblio-glutton
(latest commit)