Closed havanagrawal closed 5 years ago
@havanagrawal The reason I added the timestamps to the image names as well, was to ensure that we had unique image names across datasets. It would avoid name collisions if we ever wanted to create a single dataset by copying images from the different ones. Does that make sense?
Interesting. In that case, we can remove it at the lowest level, while retaining the ability to merge datasets. Going from
data/synth_data_2019_01_09_22_45_08/image_0_2019_01_09_22_45_08/train_image/image_0_2019_01_09_22_45_08.jpg
to
data/synth_data_2019_01_09_22_45_08/image_0_2019_01_09_22_45_08/train_image/image_0.jpg
should be possible, right?
I would just want the class id's delimiters to be '___' triple underscore in the train_mask folder.
As discussed, the timestamps are necessary especially when we generate data in parallel and then merge them at the end. Closing.
Problem
The synthesized data directory looks something like:
data/synth_data_2019_01_09_22_45_08/image_0_2019_01_09_22_45_08/train_image/image_0_2019_01_09_22_45_08.jpg
This feels incredibly noisy to me. Can we instead favor something more concise, such as:
data/synth_data_2019_01_09_22_45_08/image_0/train_image/image_0.jpg
In other words, I don't see the point of embedding the timestamp at three levels of the path.
@vivanvish Was there any particular reason (that perhaps I am completely missing) for requiring the timestamp at each level?
Solution
Change the image filename format to:
data/synth_data_{timestamp}/image_{k}/train_image/image_{k}.jpg
@pshivraj I'm assuming this makes no difference to your training pipeline, since afair you were using
os.walk
?