Open WesLee88524 opened 1 week ago
For SA-1B, it is crucial to filter those watermarked images, we didn't have a good detector so we adoptedd a naive way by filtering those prompts contianing human-related words. There may be better ways to filter these image datasets.
For SA-1B, it is crucial to filter those watermarked images, we didn't have a good detector so we adoptedd a naive way by filtering those prompts contianing human-related words. There may be better ways to filter these image datasets.
Thank you for your reply. Could you please release the exact image list if possible? Manually filtering the data would be both time-consuming and inefficient.
Hi, it is a good work. However, during the reproduction process, you only used 6.9M images out of the entire 11M SA-1B dataset. Can you release the exact image list to facilitate our reproduction? Similarly, the conceptual-12m dataset also used part of it. Thanks!