openstate / open-raadsinformatie

Doorzoek vergaderstukken van gemeenten en provincies
https://openbesluitvorming.nl
MIT License
28 stars 12 forks source link

Duplicate Alkmaar, triple Provincie Zuid-Holland #410

Open joepio opened 3 years ago

joepio commented 3 years ago

ori_alkmaar_20210408224218 and ori_alkmaar_20190809125035, but there is only one source (notubiz)

joepio commented 2 years ago

Interestingly, both indexes contain identical documents with identical identifiers

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_alkmaar_20190809125035%22%5D&showResource=https%253A%252F%252Fid.openraadsinformatie.nl%252F4039868

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_alkmaar_20210408224218%22%5D&showResource=https%253A%252F%252Fid.openraadsinformatie.nl%252F4039868

The 2019 variant contains far more resources

joepio commented 2 years ago

Something similar is going on with Losser:

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_losser_20190809125040%22%5D

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_losser_20201101222316%22%5D

And Amsterdam

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_amsterdam_20190809125033%22%5D

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_amsterdam_20210505222530%22%5D

And arnhem

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_arnhem_20190809125139%22%5D

https://openbesluitvorming.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_arnhem_20201013222110%22%5D

And goes

https://zoek.openraadsinformatie.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_goes_20201005093650%22%5D

https://zoek.openraadsinformatie.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_goes_20210408224218%22%5D

And barneveld

https://zoek.openraadsinformatie.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_barneveld_20190809125117%22%5D&type=%5B%22MediaObject%22%5D

https://zoek.openraadsinformatie.nl/?zoekterm=%22*%22&organisaties=%5B%22ori_barneveld_20210505222530%22%5D

It seems like the newer ones always contain less resources, and that the older one contains all the resources that the newer one contains. In other words, I think deleting the newer one would suffice.

joepio commented 2 years ago

I see some patterns in the index dates.

2021-05-05

2021-04-08

I assume these are dates I ran an import script. Maybe something went wrong, and a new index was created