wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
135 stars 8 forks source link

Missing Images #21

Closed Rane90 closed 7 months ago

Rane90 commented 8 months ago

Hi,

Thank you for creating this wonderful repository.

I've recived acess to the dataset through Zenodo and downloaded all files. There seems to be missign images. Out of the 10 packed .zip there are only 650K~ images (out of 1M).

Is this an issue or am I missing something?

Thank you again

wisdomikezogwo commented 7 months ago

Hi,

No, you are not missing anything, the paper explains this in sections 3.2 and 3.4 (https://proceedings.neurips.cc/paper_files/paper/2023/hash/775ec578876fa6812c062644964b9870-Abstract-Datasets_and_Benchmarks.html). A brief answer would be that's all the images but the dataset is a many-to-many pairing, and when rolled-out/stretched gets > 1M

mk-runner commented 7 months ago

Amazing work! I have applied for access to the data through Zenodo, but it has not been approved yet.

wisdomikezogwo commented 7 months ago

Approved! sorry for the delay.