Closed chenyehuang closed 3 months ago
Hello! Which subset of the data are you referring to? We are aware of a problem with the pdf subset of the data where sometimes the images on the same page can be ordered differently from each other (and yes there is a bug with the hash in this scenario, we will see if we can fix it). You should still be able to map from tiff to json though as the images appear in the order used in texts list.
Thanks for your reply. I did have a problem with the pdf dataset. In addition, is the sha256 hash value converted from the image data? Or is it added with other information? Why is the result different when I convert the image to sha256 than in the json file?
It is from images but pre conversion to tiff format so unfortunately they now don’t match :(. It was a mistake on our end. Thanks for bringing this up though! I will update the readme and maybe ping here if I get the chance to fix it.
Thank you for answering my questions.
I have done experiments and found that the order of image data in json is not one-to-one aligned with tiff images, and I don't know what data the sha256 of the image in json is converted from. Could you please answer my doubts?