kvos / CoastSat.slope

Beach-face slope estimation from satellite-derived shorelines, extension of the CoastSat toolbox.
http://coastsat.wrl.unsw.edu.au/
GNU General Public License v3.0
59 stars 17 forks source link

Parameters to get shoreline time-series suitable for slope estimation #14

Closed nkhelder closed 4 months ago

nkhelder commented 2 years ago

Hi there!

I'm unable to successfully estimate beach slopes for my region of interest (error code 'ValueError: zero-size array to reduction operation maximum which has no identity'). When looking at the cross_distance dictionary output, many (but not all) of my transect arrays are are full of NaN values, and I'm not sure why this is happening. I tried a few basic fixes, including swapping land/seaside coordinates for my transects, confirming matching coordinate systems, etc. but haven't had much luck.

Any suggestions/clarification on where I am going wrong would be great! Thanks in advance.

** Update I have been exploring my issue further but haven't made much progress. The empty arrays always occur for some transects, but not for others. I've confirmed that the transect file is properly formatted with correct coordinates, projections, etc. I would assume I would have an empty array if a given transect didn't cross through any shorelines, but this also doesn't appear to be my problem (see last figure below for an isolated example). What are some other potential reasons I could be getting empty arrays here?

Transect and shoreline locations

GitHubIssue (2)

Example output of cross_distance dict

GItHubIssue2 (2)

Example of a single transect that appears to cross multiple shorelines but still produces empty array

GitHubIssue3 (2)

kvos commented 2 years ago

hey @nkhelder , yes the thing is that you are crossing multiple shorelines so you need to tell the tool how you want to calculate the intersections. It's set up here: image

Basically, the compute_intersections function here is a bit more advanced than the one that is the main CoastSat repo, as for the slopes we want to avoid outliers as much as possible. There is also the reject_outliers call that further quality controls the shorelines and could be making those NaNs. How many outliers it says it's removing for each transect? My suggestion is to proceed as follows:

  1. Try to only run compute_intersections and see what's in cross_distance. A few NaNs are normal (e.g., L7 diagonal bands of no data) but not too many.
  2. It's always tricky to handle multiple intersections, especially if the intersection you are interested in is not the most seaward (as it looks to be in your case). You can re-run your shoreline detection with a better defined reference shoreline and smaller buffer to make sure you are not mapping those offshore sections that are not of interest. That's in the main CoastSat toolbox, you can reduce buffer_size and make sure that you are using a reference shoreline (otherwise all the water contours in the image are mapped). image
nkhelder commented 2 years ago

Hi Kilian,

Thanks for your thoughtful response. A few things..

  1. with the initial outputs I shared above, I was only running compute_intersections without running the reject_outliers call. So the example output of cross_distance in the initial post is the result you asked about, which is mostly NaN values. I get an indexing issue when running both calls together (see the image below). ListIndexOutOfRange

  2. I might be misunderstanding what you said, but I think the offshore lines you are referring to are inland water bodies (I am working on fairly complex arctic coastlines; see an example below). I am also working on the best way to deal with those (maybe using an inland water mass and modifying the shore buffer), but if you have other suggestions here would also be interested in discussing. 2002-09-06-22-00-48_L7

kvos commented 2 years ago

ok so you may want to change the nan/max setting to 'max', which means that if there are 2 intersection between the transect and the waterlines it goes for the one that is furthest seaward. It would clearly help in your case to have a reference shoreline with a tight buffer so you are not mapping those lagoons. Also make sure the transect origins are always inland (first point) and the second point is in the water. If you can put the origin of the transects between the lagoon and the coast that would help too so you don't have 2 intersections. cool to see this application in the artic, happy to help if you have more questions

nkhelder commented 2 years ago

ok, I reran this with your suggestions to modify the buffer and other detection parameters to avoid mapping the lagoon areas (see output below). Unfortunately, I still have the same issues with my transects (~90 of the 109 transects are full of nans with no data values), as well as still get the 'list index out of range' error when trying to remove outliers (including after change the nan setting to 'max' per your suggestion).

i am going to keep digging into this to figure out what is going on here but would be happy to hear any other suggestions or explanations for why this might be happening. thanks again!

updated shore

IssueUpdate (2)

kvos commented 2 years ago

are your transects facing the right direction? origin inland, second point offshore?

nkhelder commented 2 years ago

Yep, they are facing the right direction.

kvos commented 2 years ago

ok. there must be something that is NaNing those intersections in the compute_intersection function because the shorelines are there and are valid. Can you have a go with the normal CoastSat toolbox to see if the intersections are fine there? these will give some clues on what's going on as the compute_intersection function is simler there https://github.com/kvos/CoastSat/blob/master/coastsat/SDS_transects.py

nkhelder commented 2 years ago

The intersections from the original CoastSat compute_intersection function appear to work fine, with very few missing values (with settings['along_dist'] = 25). Spreadsheet with output here for reference.
coastsatTimeSeries.csv

I was able to get the intersections using the same transects with the more advanced function and outlier removal in CoastSat.Slope only if I increased the along_dist parameter to 45 m (I had ~95% NaN values when using anything below 45m). Even so, there are still many NaNs this way (~68% - far more than I would have anticipated and far more than observed with the earlier function). coastsat.slopeTimeSeries.csv

Continuing to increase the along_dist parameter above 45m reduced the number of NaN values in the resulting intersection calcs. I understand that this parameter computes the intersection of the median values within X m of either side of the transect, but don't fully understand what it means for me that increasing this distance reduces the NaN values I'm seeing so much in this case? Also curious how 25m was chosen as the default value here anyways, and even what would be considered an appropriate range of values to explore here.

kvos commented 2 years ago

@nkhelder , good to see that it works with the original CoastSat function compute_intersection. Basically, the original function does not do any quality control on the intersections, it just takes all the shoreline points that are 25 m each side of the transect (for settings['along_dist'] = 25), then projects them all perpendicularly along the transect and computes the median. See code from that function below, it only puts a NaN in there is there aren't any shoreline point within that 25 m buffer. image

On the other hand, the more advanced version that is in CoastSat.slope tries to quality control the intersections so that we don't have any outliers in the time-series as these are known to affect the slope estimation method. It also has more parameters to quality control the intersections, the code snippet is found below. There are 4 conditions that the intersection needs to meet to pass the quality control:

  1. the standard deviation of the points intersecting the transect need to be below settings_transects['max_std']
  2. the range of the points (max - min) intersecting the transect need to be below settings_transects['max_range']
  3. there are at least 2 shoreline points (hard-coded) within the buffer set by settings_transects['along_dist']
  4. the shoreline points intersecting the transect are above settings_transects['min_val'] (for example -100 sets that the limit where the shoreline points are considered 100 m landwards of the origin, any point beyond that is ignored) If the intersection does not meet these criteria, you have 3 options:
    • settings_transects['nan/max'] == 'nan': puts a NaN if the conditions are not met
    • settings_transects['nan/max'] == 'max': takes the most seaward intersection if the conditions are not met
    • settings_transects['nan/max'] == 'auto': this one is more complicated and was added to account for transects that are located in front of inland lagoons for example where there are 2 intersections but the shoreline is always the one that is the most seaward. So it checks how many times there were multiple intersections (large dispersion of the shoreline points) and if it's more than 10% of the time (if settings_transects['prc_std'] = 0.1), then it takes the max, while if it's less than 10% of the time it puts a NaN (assuming it's not a fixed water body causing the double intersection at that transect but an outlier). This part is a bit complicated but had to add it when mapping shorelines automatically over large spatial scales.

image

Finally there is the reject_outliers call which only works on the time-series and is a home-made despiking algorithm to remove patterns in the time-series that are not physically possible. For example, the beach could erode of 100 m in one timestep, but it cannot recover 100m over the next timestep, this is not physically possible so there is an outlier in there. The only parameter for this part is settings_transects['max_cross_change'] and it's the largest change that you can expect between two consecutive images. It should scale with tidal range, as on some flat meso/macrotidal coasts you can have 100s of metres of tidal excursion between consecutive images taken respectively at high and low tide.

I hope this helps to choose the right parameters for you study, otherwise you can use the original function which doesn't do any of these things.