plazi / Biodiversity-Literature-Repository

covers the creating, maintenance and upload to the BLR
3 stars 0 forks source link

duplicates #37

Open myrmoteras opened 6 years ago

myrmoteras commented 6 years ago

@gsautter we need to remove duplicates

image

https://ocellus.punkish.org/?q=oskoron&page=1

how can we do this? d

gsautter commented 6 years ago

This looks like a hard one, at least if the "rec ID"s correspond to the deposition numbers. Hard to tell how these images got uploaded twice, especially as the deposition numbers are stored in the corresponding captions right after the upload. There are three things I could imagine going wrong in this regard:

All three scenarios are fairly pathological, if not completely contrived. We might well try and catch such duplicate figures by means of a hash of the PNG data on our end. Another little improvement we need to add, I guess.

Do we have an idea of how many such duplicates there are? That would vastly help assess the extent of the problem, as well as its underlying cause.