waldo-seg / waldo

image-segmentation and text-localization
Apache License 2.0
13 stars 13 forks source link

[scripts] add some types (unfinished) #10

Closed danpovey closed 6 years ago

danpovey commented 6 years ago

This is my proposal for the types to use for data. @YiwenShaoStephen, can you please look over this ASAP? This code is mostly just comment without implementation, but let me know if you think it's workable and if you think there is a better way to do this.

YiwenShaoStephen commented 6 years ago

Are these functions for data preparation purpose?

aarora8 commented 6 years ago

I guess this task is similar to mask generation task done for madcat arabic images. There to fill all points inside a polygon (rectangle in that case), it looped over all points inside each bounding box. Since the madcat images are high resolution images (400k points in a bounding box), it resulted in making the code slow, as python is very good with loops. For 42k madcat images after speedup and parallelization, it can take around 18 hrs. Though it might not be necessary in this case, as for madcat case we had the overlapping bounding box constraint. But if possible should we do this task in C++.

YiwenShaoStephen commented 6 years ago

Ok, I get it. BTW, to provide an alternative, see https://www.kaggle.com/c/data-science-bowl-2018#evaluation. They use another method for encoding the mask instead of using polygon idea here. Not sure which one is more feasible and efficient.

danpovey commented 6 years ago

The data types are initially for data preparation but would likely be reusable for the nnet output.

danpovey commented 6 years ago

I assume this conversation right now is about enumerating all pixels inside a polygon.

Regarding efficiency: for the mask generation, at some point it would be necessary to enumerate all pixels, because we need to create the mask array. And I want this to cover non-convex polygons such as bent text, so the kaggle approach isn't quite general enough. Let's just get something working for now and worry more about efficiency later; we can use the simple approach for regression testing.

I suggest to make the code find all pixels for now, and we can try more efficient versions later. It might be possible to use some kind of trick to do this fast.. for example draw each line in the polygon in such a way that for each height (i.e. each y value) each line has exactly one x value present, and then store those x values as lists indexed by y value; and then use an even/odd approach to fill in the locations between alternating x values. Be careful about corners where 2 lines are present.

On Sun, May 6, 2018 at 11:30 PM, Yiwen Shao notifications@github.com wrote:

Ok, I get it. BTW, to provide an alternative, see https://www.kaggle.com/c/data-science-bowl-2018#evaluation. They use another method for encoding the mask instead of using polygon idea here. Not sure which one is more feasible and efficient.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386948152, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu7pbBsxLf9eRjYP3pG13vW45M9j7ks5tv7_WgaJpZM4T0VCb .

danpovey commented 6 years ago

Actually, let's not rule out dumping the masks to disk somehow. I don't know enough about how I/O is done in PyTorch to know exactly how we should do it, but I'm sure Yiwen will have something to say about that. (And let's try to make it possible to use char's or short's to do this, if Python lets us, as it would take less disk space).

As long as we have the functionality to turn the polygon-based representation into a mask-based representation, we can always do that in data preparation instead of during training.

On Sun, May 6, 2018 at 11:37 PM, Daniel Povey dpovey@gmail.com wrote:

I assume this conversation right now is about enumerating all pixels inside a polygon.

Regarding efficiency: for the mask generation, at some point it would be necessary to enumerate all pixels, because we need to create the mask array. And I want this to cover non-convex polygons such as bent text, so the kaggle approach isn't quite general enough. Let's just get something working for now and worry more about efficiency later; we can use the simple approach for regression testing.

I suggest to make the code find all pixels for now, and we can try more efficient versions later. It might be possible to use some kind of trick to do this fast.. for example draw each line in the polygon in such a way that for each height (i.e. each y value) each line has exactly one x value present, and then store those x values as lists indexed by y value; and then use an even/odd approach to fill in the locations between alternating x values. Be careful about corners where 2 lines are present.

On Sun, May 6, 2018 at 11:30 PM, Yiwen Shao notifications@github.com wrote:

Ok, I get it. BTW, to provide an alternative, see https://www.kaggle.com/c/data-science-bowl-2018#evaluation. They use another method for encoding the mask instead of using polygon idea here. Not sure which one is more feasible and efficient.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386948152, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu7pbBsxLf9eRjYP3pG13vW45M9j7ks5tv7_WgaJpZM4T0VCb .

danpovey commented 6 years ago

To clarify: when I talk about 'dumping the masks to disk' I am talking about dumping a numpy array of int that contains the object id, like the 'mask' member in Yiwen's nucleus-detection example. I am assuming that must be standard in object detection. We could use 'char' for compression if the libraries allow it.

YiwenShaoStephen commented 6 years ago

Sure. I will look into PyTorch I/O mechanism later to see if it is possible.

aarora8 commented 6 years ago

Ok, thank you. Got it.


From: Daniel Povey notifications@github.com Sent: Sunday, May 6, 2018 11:37:09 PM To: waldo-seg/waldo Cc: Ashish Arora; Comment Subject: Re: [waldo-seg/waldo] [scripts] add some types (unfinished) (#10)

I assume this conversation right now is about enumerating all pixels inside a polygon.

Regarding efficiency: for the mask generation, at some point it would be necessary to enumerate all pixels, because we need to create the mask array. And I want this to cover non-convex polygons such as bent text, so the kaggle approach isn't quite general enough. Let's just get something working for now and worry more about efficiency later; we can use the simple approach for regression testing.

I suggest to make the code find all pixels for now, and we can try more efficient versions later. It might be possible to use some kind of trick to do this fast.. for example draw each line in the polygon in such a way that for each height (i.e. each y value) each line has exactly one x value present, and then store those x values as lists indexed by y value; and then use an even/odd approach to fill in the locations between alternating x values. Be careful about corners where 2 lines are present.

On Sun, May 6, 2018 at 11:30 PM, Yiwen Shao notifications@github.com wrote:

Ok, I get it. BTW, to provide an alternative, see https://www.kaggle.com/c/data-science-bowl-2018#evaluation. They use another method for encoding the mask instead of using polygon idea here. Not sure which one is more feasible and efficient.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386948152, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu7pbBsxLf9eRjYP3pG13vW45M9j7ks5tv7_WgaJpZM4T0VCb .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/waldo-seg/waldo/pull/10#issuecomment-386948925, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AcFBRZmVXYiC_bN9uXmqk-IXjG7bYJ7vks5tv8FlgaJpZM4T0VCb.

danpovey commented 6 years ago

Regarding pytorch I/O: it seems to like to dump the images and labels together in a tar file, from looking at you process_data.py in egs/dsb2018/v1/local/. One possibility for MADCAT is to have a data preprocessing stage, similar to your process_data.py but run in parallel, in which we would downsample the data slightly (e.g. by a factor of 2 or 4) and at the same time compute the masks, and dump in a tar file like PyTorch likes to do. Ashish, assume for the time being that that's what we'll do.

On Sun, May 6, 2018 at 11:54 PM, Ashish Arora notifications@github.com wrote:

Ok, thank you. Got it.


From: Daniel Povey notifications@github.com Sent: Sunday, May 6, 2018 11:37:09 PM To: waldo-seg/waldo Cc: Ashish Arora; Comment Subject: Re: [waldo-seg/waldo] [scripts] add some types (unfinished) (#10)

I assume this conversation right now is about enumerating all pixels inside a polygon.

Regarding efficiency: for the mask generation, at some point it would be necessary to enumerate all pixels, because we need to create the mask array. And I want this to cover non-convex polygons such as bent text, so the kaggle approach isn't quite general enough. Let's just get something working for now and worry more about efficiency later; we can use the simple approach for regression testing.

I suggest to make the code find all pixels for now, and we can try more efficient versions later. It might be possible to use some kind of trick to do this fast.. for example draw each line in the polygon in such a way that for each height (i.e. each y value) each line has exactly one x value present, and then store those x values as lists indexed by y value; and then use an even/odd approach to fill in the locations between alternating x values. Be careful about corners where 2 lines are present.

On Sun, May 6, 2018 at 11:30 PM, Yiwen Shao notifications@github.com wrote:

Ok, I get it. BTW, to provide an alternative, see https://www.kaggle.com/c/data-science-bowl-2018#evaluation. They use another method for encoding the mask instead of using polygon idea here. Not sure which one is more feasible and efficient.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386948152, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADJVu7pbBsxLf9eRjYP3pG13vW45M9j7ks5tv7_WgaJpZM4T0VCb .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ waldo-seg/waldo/pull/10#issuecomment-386948925, or mute the thread< https://github.com/notifications/unsubscribe-auth/AcFBRZmVXYiC_bN9uXmqk- IXjG7bYJ7vks5tv8FlgaJpZM4T0VCb>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386950772, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu33UU3abomiDqSYjFq9PMEITKhACks5tv8V_gaJpZM4T0VCb .

aarora8 commented 6 years ago

Ok, thanks, will add process_data.py and an option for downsampling images. For polygon mask generation, can we have an overlapping polygon situation.

danpovey commented 6 years ago

Yes there can be overlapping polygons, but we'll deal with it by ordering the polygons from "bottom-most" to "top-most". I was leaving that till later. For now just ignore the issue. We'll be writing the mask in an array of the form

mask[x,y] = object_id

and the top-most polygons will get written last and override previous ones.

On Mon, May 7, 2018 at 12:05 AM, Ashish Arora notifications@github.com wrote:

Ok, thanks, will add process_data.py and an option for downsampling images. For polygon mask generation, can we have an overlapping polygon situation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/pull/10#issuecomment-386951851, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuy2qRVkCmiHgJ_v38POjaKaPr4wWks5tv8gBgaJpZM4T0VCb .

aarora8 commented 6 years ago

ok, thanks, got it.

danpovey commented 6 years ago

merging this so it doesn't block anyone. @aarora8, please finish any TODOs if you get time.

aarora8 commented 6 years ago

ok, thanks, will do.