ubarsc / pyshepseg

Python implementation of image segmentation algorithm of Shepherd et al (2019) Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing 11(6).
https://www.pyshepseg.org
MIT License
10 stars 4 forks source link

Catch any incomplete pages at end of stats processing and raise error #17

Closed gillins closed 2 years ago

gillins commented 2 years ago

In some situations the user may have a RAT that has more segments than what is in the image. For example, if they have taken a subset. In this case some pages will not be written back to the file as they are never marked as 'complete'.

I think we should deal with these situations. The attached PR ensures all incomplete pages are written at the end and a warning printed. I'd also be happy with raising an error in instead. I don't think we should let this go through undetected though.

This also fixes with a problem not detecting we already have an Alpha column when creating a colour table.

BTW I'm working on a new and improved "compress" algorithm to properly subset imagery and the RAT. Stay tuned.

cc: @petescarth

gillins commented 2 years ago

I should have said as a note to this that @petescarth is currently tracking through a situation where this happens on a real segmentation without any subsetting on inputs which is slightly worrying.

neilflood commented 2 years ago

Was there any further clarity on whether this was happening in other circumstances? As @gillins says, that would be rather worrying. I would like to understand if that was happening before we merge a PR which may obscure such behaviour.

petescarth commented 2 years ago

I can confirm that I have a segmentation output where the histogram is zero at both 0 and 40128244, and that this was where the stats processing was failing.

On checking the segmentation, this occured on a tile edge but I'm puzzled as to the values. The screen grab below shows 40128243, 40128244, 40128245.

Note that there is no sign of 40128244 but using the value tool in QGIS, both the 40128243 and 40128245 have the value of 40128244 on mouseover (at full zoom, so not overviews).

I'm sure I'm doing something wrong here as this is very strange, so will look into it further when back from the weekend.

image

neilflood commented 2 years ago

Thanks @petescarth

I don't really understand this. If those two segments genuinely have 40128244, then why would QGIS be colouring them as the other two values? Very mysterious.......

gillins commented 2 years ago

Forget QGIS - what does TuiView say? :smile: At least we know that TuiView uses the full resolution data for the query tool... I've seen weird things with QGIS before with thematic data - I don't totally trust it

petescarth commented 2 years ago

After a couple of false starts with Tuiview (needs a seperate thread) I can confirm 40128244 has a histogram of 0.

The QGIS issue may be me getting caught by this bug although the values don't seem to be high enough to trigger it: https://github.com/qgis/QGIS/issues/44902

neilflood commented 2 years ago

OK, so that sounds like those two other segments did not actually have that value, just QGIS reporting incorrectly. Good.

So, the next question is - are there any pixels with a value of 40128244 present anywhere in the segment raster? This will tell us whether the problem is related to not saving histogram counts, or to leaving a valid segment ID unused. I suspect the latter, and hence a problem with the re-labelling of the segments when stitching tiles together. However, I am not at all sure, so open to either answer.

petescarth commented 2 years ago

To confirm, there are no pixels with a value of 40128244. And all other histogram values I checked are ok.

neilflood commented 2 years ago

Excellent, thanks. That means it is not being caused by failing to write the histogram count, but rather it is because that value is not present. My guess is that this value was re-coded out during tile stitching, but somehow not re-used for a different segment. That's a guess. Could you use Tuiview to find the segment with an adjacent value? If segments with values close to this all occur on a tile boundary, that would support the theory.

In which case, I will need to ponder deeply, and read that piece of code carefully.

petescarth commented 2 years ago

See the QGIS screengrab above - the adjacent values are on a tile boundary. In case it's important, the command used to generate was:

    tiledSegResult = tiling.doTiledShepherdSegmentation(RASTER, KEAFILE, 
            tileSize=8192, overlapSize=512, 
            minSegmentSize=1000, numClusters=1024,
            bandNumbers=None, subsamplePcnt=None,
            maxSpectralDiff='auto', spectDistPcntile=5,
            imgNullVal=0, fourConnected=True, verbose=True,
            simpleTileRecode=False, outputDriver='KEA')

Somewhat conveniently, in Tuiview when zoomed out 2x (using the overviews), the two adjacent values actually have the RAT color value (red) for the missing segment image

gillins commented 2 years ago

@petescarth is this resolved now, or are you still seeing this problem?

gillins commented 2 years ago

I think this one is good to go @neilflood - merge if you are happy.

neilflood commented 2 years ago

After our discussions last week, I am happy.