waldo-seg / waldo

image-segmentation and text-localization
Apache License 2.0
13 stars 13 forks source link

added data_io and make changes for dsb2018 setup accordingly #42

Closed YiwenShaoStephen closed 6 years ago

YiwenShaoStephen commented 6 years ago

added data_io.py to save processed image_with_mask as numpy arrays. Also enable a "cache=True" option to decide whether read all data in memory once or do it one by one. The changes on other scripts are made accordingly for dsb2018 setup. @aarora8 It would be great if you can try this on madcat.

aarora8 commented 6 years ago

Ok, thanks. I will test it with madcat setup.

On Tue, May 22, 2018, 23:34 Yiwen Shao notifications@github.com wrote:

added data_io.py to save processed image_with_mask as numpy arrays. Also enable a "cache=True" option to decide whether read all data in memory once or do it one by one. The changes on other scripts are made accordingly for dsb2018 setup. @aarora8 https://github.com/aarora8 It would be great if you can try this on madcat.

You can view, comment on, or merge this pull request online at:

https://github.com/waldo-seg/waldo/pull/42 Commit Summary

  • added data_io and make changes for dsb2018 setup accordingly

File Changes

Patch Links:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/42, or mute the thread https://github.com/notifications/unsubscribe-auth/AcFBRZKEn2zXo8JJXpe4SzGXcLYd0aDeks5t1NiwgaJpZM4UJt7X .

aarora8 commented 6 years ago

It is working successfully with MADCAT Arabic setup.

danpovey commented 6 years ago

Great!

On Wed, May 23, 2018 at 2:24 PM, Yiwen Shao notifications@github.com wrote:

@YiwenShaoStephen commented on this pull request.

In scripts/waldo/data_io.py https://github.com/waldo-seg/waldo/pull/42#discussion_r190352424:

+import numpy as np + + +class DataSaver:

  • def init(self, dir):
  • self.dir = dir
  • if not os.path.exists(self.dir):
  • os.makedirs(self.dir)
  • os.makedirs(self.dir + '/numpy_arrays')
  • def write_image(self, name, image_with_mask):
  • """ This function accepts a image_with_mask object and its name, and saves
  • its img, mask and object_class as a numpy array under the given directory (
  • i.e. dir/numpy_arrays/name.suffix.npy)
  • """
  • img = image_with_mask['img']

OK, I will handle all of it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/42#discussion_r190352424, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu5Gtd71XPRa5qs-hoFQxCgd1-0-Kks5t1alYgaJpZM4UJt7X .

YiwenShaoStephen commented 6 years ago

Now the shared dataset class WaldoDataset is added and we don't need dataset.py anymore. And image_ids.txt is written after all the data is read. Also make changes on run.sh to fit with @hhadian newest update. @aarora8 please see if such pipeline can work well with madcat dataset.

danpovey commented 6 years ago

great progress! merging.