martaannaj / RecommenderServer

GNU General Public License v3.0
1 stars 2 forks source link

Error while parsing dump #85

Closed goramartin closed 7 months ago

goramartin commented 8 months ago

Hello, yesterday I downloaded a new wikidata json dump. But during the first phase of the build-tree I have received an error. I am still running on version before dep. updates.

2024/02/24 23:57:50 Processed 78530000 entities
2024/02/24 23:57:51 Processed 78540000 entities
2024/02/24 23:57:51 Something went wrong while processing.. cannot decode json: calendar model: invalid value
panic: Something went wrong while processing.. cannot decode json: calendar model: invalid value

goroutine 13 [running]:
log.Panicln({0xc000b46fb0?, 0xc0000380b0?, 0xc0000ce4b0?})
        /usr/local/go/src/log/log.go:398 +0x65
RecommenderServer/transactions.WikidataDumpTransactionSource.func1.1()
        /home/gora/RecommenderServer/transactions/createTransactions.go:57 +0xd5
created by RecommenderServer/transactions.WikidataDumpTransactionSource.func1
        /home/gora/RecommenderServer/transactions/createTransactions.go:24 +0x8d
goramartin commented 8 months ago

I tried it with the new dependencies update, just in case, and I am getting the same error.

miselico commented 8 months ago

From parallel conversation by email:

The entity number indicated above is not necessarily indicative because the reading an processing happen in parallel.

miselico commented 8 months ago

Now also reported here https://gitlab.com/tozd/go/mediawiki/-/issues/1 . It appears to be an issue in the parser which assumes one of two calendar types, while apparently also another on is used.

mitar commented 8 months ago

BTW, I suggest you print out errors with "% -+#.1v" as the format string because that will dump out also the problematic JSON and/or value itself. So we will be able to see what is the calendar type value used.

mitar commented 8 months ago

So I found it. The value is https://www.wikidata.org/wiki/Q12138, which should not be possible, because it should be https://www.wikidata.org/wiki/Q1985727, not sure how that value for the model happened.

mitar commented 8 months ago

Example: https://www.wikidata.org/wiki/Q105958428

mitar commented 8 months ago

Version v0.14.1 of gitlab.com/tozd/go/mediawiki has been released which should allow parsing those non-standard calendar models as well.

goramartin commented 7 months ago

After updating the dependency locally, it works again. Thank you very much.

miselico commented 7 months ago

Thank you both for fixing and checking this. @mitar could you elaborate what the format string "% -+#.1v" means? I cannot find this syntax in the fmt documentation. I found: %#v a Go-syntax representation of the value, but not what the other parts would mean. Is it an option to just log errE.Details() ?

mitar commented 5 months ago

% -+#.1v is syntax for formatting errors from gitlab.com/tozd/go/errors package which more or less says "print out everything you have" (stack trace, recursive wrapped/joined errors, details, etc.). You can see documentation here. This package is used by the gitlab.com/tozd/go/mediawiki. See also this issue.

%#v would format only any additional details.