wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
119 stars 9 forks source link

Issues with CSV files #18

Open BaluHarshavardan99 opened 4 months ago

BaluHarshavardan99 commented 4 months ago

Hi, I am trying to recreate the QUILT dataset. I have a doubt regarding some of the columns in the CSV files that you have shared in the repo. Can you please highlight how you obtained the "stable_times" column in quilt_recon.csv?

Also, Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm? Can you please elaborate on the generation of the quilt_data.csv file?

Thank you

wisdomikezogwo commented 4 months ago

Hi,

Thanks for bringing this up, it's somewhat of a known issue as highlighted in the readme here.

_Can you please highlight how you obtained the "stable_times" column in quiltrecon.csv? To preface, at the time we were creating QUILT we weren't going to provide code to re-create the dataset as such it wasn't obvious then, to save the exact time/or time-interval of representative frames as such, the reconstruction code isn't perfect.

That said we then put into quilt_recon.csv the key_frames (or scene frames in the paper) and the stable regions within the chunks ( given by get_histo_srt_im_recon) if any. So, to make it clearer, we extract images in a cascaded manner, the visual representation of this is in the supplementary of the paper Figure 7. For each valid chunk in a video, we look for stable regions (i.e continuous frames with little to no visual changes), and for every one we find we then take the median frame (pixel-wise), if there aren't any stable frames we take the frame within the chunk and deduplicate them. This means for the latter you can have precise timing but for the former, it's a median of various frames and hence you can't peg it to a specific time, but more to an interval, and that's why we released code to extract said frame (i.e save_frame_chunks_recon). All this to say, stable times are the regions in which we think there are small chunks of stable frames for which we can collect representative images because the narrator stops for a bit to explain the image's features but for swats of frames that do not have stable regions because the narrator was moving around the WSI for an extended time we collect all the images we can and deduplicate them to represent the chunks for which this happens.

_Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm?_ Yes, for all the representative images collected from the process described above (so not just Static Video Chunk Detection as not all chunks have stable frames), we then save them to disk and to file the paths. so quilt-data.csv is just an early dump of the data (text and metadata) before releasing the full and current data here.

Let me know if you need any more clarification or have more questions, thanks.

BaluHarshavardan99 commented 4 months ago

Hi, Thank you very much for the clarification. It was really helpful.

I need clarification on one more variable - How is the variable "pair_chunk_time" decided?

Thank you for your time