Grid size assumptions and findPower

mmellis / rSPACE

Power analysis for occupancy-based monitoring

7 stars 1 forks source link

Grid size assumptions and findPower #28

Open technoautotroph opened 8 years ago

technoautotroph commented 8 years ago

My field season is over and I'm back to running my simulations. Thanks for answering my previous questions. I have managed to run a simulation with 10 replicates using my customized salamander georeferenced tiff. My map: AMapS = 933.2 m x 1712 m = 0.9332 km x 0.1712 km

##Create a new RasterLayer or RasterBrick with a lower resolution (larger cells) - reduce processing
## time, otherwise I get a crash
aggregate(AMapS, fact=2)

## Converting the raster to 0 to 1 scale
AMapS <- rescaleImage(AMapS, ymin = 0, ymax = 1)

The 10 replicate analysis took about 5 days to run on a new speedy computer that was purchased for this project (2.6 GHz Intel Core i7, 16GB RAM). I have another map: AMapL = 8291 m x 10613 m = 8.291 km x 10.613 km, but the simulations take MUCH longer and the risk of crash is high even using the following maximum conversion: aggregate(AMapL, fact=5)

I was able to TestReplicates and getResults with AMapS, but the power analysis failed because:

Smoothed estimates may not be reliable. Try increasing number of data points or number of simulations

I'm re-assessing my assumptions to see if I have a proper understanding of the different lines of input and was hoping that you might be able to provide some guidance - perhaps I don't have a full understanding of the variables.

The following code was used:

Scenario9

##Output and understanding:
###The **Population simulation** variables are straight forward:
$N [1] 150;                  ## 150 individuals are contained in the scenario
$lmda [1] 0.933;          ## "following a standard exponential growth equation" - seems reasonable
$n_yrs [1] 10;               ## Assuming ten years of study
$MFratio [1] 0.6 0.4;    ## There are more females

Movement parameters are a little more complicated:

buffer = "..buffer distances between individual activity centres (often treated as the average distance between home range centres) and the minimum habitat suitability value at which activity centres can occur." (Ellis et al. 2015, p. 2) "Distance between activity centres (km) for each type of ## individual" (Ellis et al. 2015)

Here is what I know about Long-toed salamanders maximum/average range sizes (km2):

1.00/0.28 (juveniles), 0.36/0.17 (males), 0.41/0.12 (females) Therefore, I opted for a conservative value and inputted the average range size:

$buffer [1] 0.12 0.17;

This may be too conservative? In my next scenario I will increase to : 0.41 0.36. I expect most individuals will stay confined to an area that is 2 km in size, but not sure if this is a problem since my map is even smaller than this:

$moveDist [1] 0.2 0.2;

Males tend to move more in salamanders so I used the values inputted below, but perhaps I should use 0.5 since I'm using an average for moveDist?: $moveDistQ [1] 0.7 0.9;

I don't fully understand this parameter - "Truncation for long-distance movements during sampling season. Quantile/proportion of movement distribution to include"? Could you elaborate further?

$maxDistQ [1] 0.95 0.95;

My sample site is 1400 m^2, therefore:

$grid_size [1] 0.0014;

I use different habitat values, so I selected the following cutoff values that seem reasonable:

$habitat.cutoff [1] 0.75;
$sample.cutoff [1] 0.5;

I assume three visits and very low detection probability given that most salamanders are digging underground:

$n_visits [1] 3;
$detP_test [1] 0.3

Thank you!!

technoautotroph commented 8 years ago

This may also help to understand my map:

AMapS <- raster('Q:/2016/FWCP/15295-069 Peace Wetlands and Amphibians/Analysis and Database/rSPACE/AMapS.tif', package='rSPACE')

area(AMapS)

class : RasterLayer dimensions : 3456, 1884, 6511104 (nrow, ncol, ncell) resolution : 0.4953675, 0.4953675 (x, y) extent : 394893.3, 395826.5, 6274671, 6276383 (xmin, xmax, ymin, ymax) coord. ref. : +proj=utm +zone=10 +ellps=GRS80 +units=m +no_defs data source : in memory names : layer values : 0.245389, 0.245389 (min, max)

Warning message: In .local(x, ...) : This function is only useful for Raster* objects with a longitude/latitude coordinates

aggregate(AMapS, fact=2)

class : RasterLayer dimensions : 1728, 942, 1627776 (nrow, ncol, ncell) resolution : 0.990735, 0.990735 (x, y) extent : 394893.3, 395826.5, 6274671, 6276383 (xmin, xmax, ymin, ymax) coord. ref. : +proj=utm +zone=10 +ellps=GRS80 +units=m +no_defs data source : in memory names : AMapS values : 0, 255 (min, max)

area(AMapS)

class : RasterLayer dimensions : 3456, 1884, 6511104 (nrow, ncol, ncell) resolution : 0.4953675, 0.4953675 (x, y) extent : 394893.3, 395826.5, 6274671, 6276383 (xmin, xmax, ymin, ymax) coord. ref. : +proj=utm +zone=10 +ellps=GRS80 +units=m +no_defs data source : in memory names : layer values : 0.245389, 0.245389 (min, max)

Warning message: In .local(x, ...) : This function is only useful for Raster* objects with a longitude/latitude coordinates

mmellis commented 8 years ago

Hi Mark - I see a couple of things in your parameterization that are worrying:

Especially for your small map, the movement parameters that you're using seem unlikely to allow you to fit 150 individuals. (On your small landscape, it seems like you are not going to be able to fit more than a handful of individuals with these values, and maybe <5 if you bump up to the less conservative values!) Searching to place individuals on a landscape that is saturated with individuals is part of what really slows down these simulations. You may want to check the N_final.txt file in the output to see how many individual it actually can place based on your movement rules.
Based on your grid size, do you end up with ~98 cells total for your small landscape? You big map looks like it'll end up being 200x250 cells = 50K total. How much do your habitat and sampling cutoffs reduce that number of potential cells? 50K is a really large number of cells, and is likely to really slow things down as well. Is that realistic in your design?
You mention juveniles, males, and females; but you're only using 2 types of individuals. Is that what you want?
You say individuals are usually confined to a 2km area but your movement radius is 0.2km. It looks to me from your movement parameters vs the size of your landscapes, you're likely to be at the point where it only takes a few individuals until 100% of your landscape is occupied. Plus, individuals are travelling well beyond the extent of a single cell. These aspects are going to make occupancy-based analyses fairly disconnected from any changes in abundance.
The warning message from the smoother is usually due to the number of different n_grid steps that you have tested, combined with what I expect is a fairly high and constant occupancy rate. Under these conditions, there is not a lot that a smoother can do the interpolate the changes with n_grid. You may want to try increasing your sampling at the LOW end of your range (e.g. Parameters$grid_sample =c(0.01,0.05,0.1,0.15,0.2,0.25,0.35, 0.45,0.55,0.75,0.95). Presumably it's the createReplicates() step that's taking up most of your simulation time, not the testReplicates(), correct?

technoautotroph commented 8 years ago

Thank you so much for your prompt and informative response!! This is very helpful.

You may want to check the N_final.txt file in the output to see how many individual it actually can place based on your movement rules.

This is what I have: 70 65 61 57 53 49 46 43 40 37 75 70 65 61 57 53 49 46 43 40 72 67 63 59 55 51 48 45 42 39 74 69 64 60 56 52 49 46 43 40 73 68 63 59 55 51 48 45 42 39 70 65 61 57 53 49 46 43 40 37 73 68 63 59 55 51 48 45 42 39 73 68 63 59 55 51 48 45 42 39 68 63 59 55 51 48 45 42 39 36 70 65 61 57 53 49 46 43 40 37

Seems you are correct - there are fewer than 150 individuals squished in here!

Based on your grid size, do you end up with ~98 cells total for your small landscape? You big map looks like it'll end up being 200x250 cells = 50K total. How much do your habitat and sampling cutoffs reduce that number of potential cells? 50K is a really large number of cells, and is likely to really slow things down as well. Is that realistic in your design?

I'm trying to see how this is calculated and this is where my confusion likely rests. The problem is confusing as I use the aggregate function to change the dimensions, which I researched as a means to reduce the processing time (a key problem I am facing):

area(AMapS)
class       : RasterLayer 
dimensions  : 3456, 1884, 6511104  (nrow, ncol, ncell)
resolution  : 0.4953675, 0.4953675  (x, y)
extent      : 394893.3, 395826.5, 6274671, 6276383  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=10 +ellps=GRS80 +units=m +no_defs 
data source : in memory
names       : layer 
values      : 0.245389, 0.245389  (min, max)

Versus:

> AMapS <- aggregate(AMapS, fact=2)
> area(AMapS)
class       : RasterLayer 
dimensions  : 1728, 942, 1627776  (nrow, ncol, ncell)
resolution  : 0.990735, 0.990735  (x, y)
extent      : 394893.3, 395826.5, 6274671, 6276383  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=10 +ellps=GRS80 +units=m +no_defs 
data source : in memory
names       : layer 
values      : 0.9815559, 0.9815559  (min, max)

Warning message:
In .local(x, ...) :
  This function is only useful for Raster* objects with a longitude/latitude coordinates

In the aggregate map I have 1,627,776 cells, but I don't think that is the cells that you are referencing here. If my understanding is correct, the cells you are referencing has to do with the physical extent of the map. My small map is: 933.2 m x 1,712 m or 159.76 hectares. I set my cell size ($grid_size) to 0.0014, or 0.14 hectares. Under this assumption there are 1141.14 cells contained in the map. My big map is: 8,292 m x 10,614 m or 8801.13 hectares. This gives 62,865.21 cells within the larger map. Is this correct?

You mention juveniles, males, and females; but you're only using 2 types of individuals. Is that what you want?

Yes - I just give males and females for now.

You say individuals are usually confined to a 2km area but your movement radius is 0.2km...These aspects are going to make occupancy-based analyses fairly disconnected from any changes in abundance.

Okay. This probably has to do with my misunderstanding of the map extent. I’m assuming that salamanders move in an area that approximates 20 ha in size. My small map is 159.76 ha giving enough room for 8 independent home-range distributions assuming that they spaced evenly in a structured way. If I assume 150 salamanders, then there would have to be 18.75 with an overlapping range, which isn’t completely unreasonable – but I get your point about filling the map quickly. The larger map would give 440 independent home-range distributions and 3 salamanders with an overlapping range, which sounds more reasonable. However, this all depends on my understanding of $grid_size, which I may not have correctly interpreted here.

Correction: If we have enough room for 440 x 20 ha home range distributions in the large map and 150 salamanders in total, then this will result in 0.34 salamanders per 20 ha; assuming a random / non-clumping pattern.

Presumably it's the createReplicates() step that's taking up most of your simulation time, not the testReplicates(), correct?

That is correct.

mmellis commented 8 years ago

Hmm...in terms of the extent of the map, what I am asking about has to do more with your grid size (0.14ha). These simulations set up a sampling grid on your landscape. I am curious how many cells make up that sampling grid. There are lots of ways to do this, probably easiest to look at the encounter history files that are created by createReplicates() to see how many rows are included.

technoautotroph commented 8 years ago

This is what an encounter history looks like:

/* 1 / 010011000000010010000000000000 1; / 2 / 000000000000000000100000000011 1; / 3 / 100000000000000000000000100000 1; .... / 540 / 000000000000000000000100000010 1; / 541 / 000010100000000000000000010000 1; / 542 */ 100000100000000010000000001000 1;

All of the replicates have 30 columns and 542 rows. Is this the information you are after?

technoautotroph commented 8 years ago

I've been running the large map overnight with the following input:

$N [1] 500; $lmda [1] 0.933; $n_yrs [1] 10; $MFratio [1] 0.6 0.4; $buffer [1] 0.36 0.41; $moveDist [1] 0.2 0.2; $moveDistQ [1] 0.7 0.9; $maxDistQ [1] 0.95 0.95; $grid_size [1] 0.0014; $habitat.cutoff [1] 0.75; $sample.cutoff [1] 0.5; $n_visits [1] 3; $detP_test [1] 0.5

This will result in 1.14 salamanders per 20 ha of habitat assuming a random / non-clumping pattern. However, I've been running the encounter history overnight and I still haven't been prompted for the map. Hopefully, I'm not being unreasonable in my spatial ecology assumptions. I have set the grid size to replicate my actual sampling design in the field (i.e., plot size); unlike wolverines, a 100 km^2 sampling grid would not make sense for sampling salamanders.

My goal is to create scenarios that somewhat approximate conditions in the field. I do not know exactly how long-toed salamanders are naturally distributed, but there is sufficient evidence to suggest that they do clump. I'm using what I know about the biology of these organisms and the published evidence on their movements to set the Movement Parameters.

The part that I'm finding confusing is that my map has a smaller spatial extent, but dimensions (nrow, ncol, ncell) that are far in excess of your wolverine map. I don't know how to reduce the dimensions to match something like your sample map while retaining the georefencing. I might be wrong, but I suspect that this is what is causing my computation problems?

Your assistance is very much appreciated. This is an exciting project and I like the stated ambition in your paper: "to provide a simple tool to evaluate the amount of effort required for a survey to estimate statistical trend...Through this effort, we hope to foster careful monitoring design and encourage multijurisdictional collaboration to enable large-scale monitoring efforts." If I can figure this out, then I hope to introduce techniques for using this software in the herpetological literature so it can be used in monitoring efforts for this taxonmic group.

mmellis commented 8 years ago

Hi Mark - I'm not entirely sure what you are asking, so let me know if this helps clarify and if not maybe you can walk me through what else I'm missing! I will use sampling location or sampling site to refer to the grid produced by rSPACE, and pixels to refer to the minimum resolution cells of your map.

It looks like you have your small map is 3456x1884 pixels at 0.5m resolution or 1728x942 at 1m resolution. With a 0.14ha sampling grid, each location in your sample would get translated to approx 40mx40m grid cell. Based on your small map resolutions, this will mean having sampling sites that are around 40x40 pixels or 80x80pixels, give or take a few for rounding. Either of those should be reasonable, computationally, and you won't be running to issues with too few pixels in a sampling location.

When you move up to the big map, you're right overall the dimensions of your map are going to be HUGE and those will likely cause computational problems. I would recommend that, if you do decrease the resolution of your map, that you want to make sure you still have at least 4x4 pixels in the cells of the sampling grid (e.g. each sampling location is comprised of at least 16 pixels). So, that means, don't go any further than 10mx10m resolution.

I see now from your previous post:

I set my cell size ($grid_size) to 0.0014, or 0.14 hectares. Under this assumption there are 1141.14 cells contained in the map. My big map is: 8,292 m x 10,614 m or 8801.13 hectares. This gives 62,865.21 cells within the larger map. Is this correct?

These should be whole numbers on the actual sampling grid, but otherwise, that looks reasonable to me. You set a sampling cutoff that limits the sampling locations to only those with enough "good" pixels (e.g. $habitat.cutoff=0.75 means that a pixel has to have a habitat value >0.75 to be considered "good" habitat and $sample.cutoff=0.5 means that over 50% of the pixels in the sampling location have to be "good" for the cell to be considered available to sample). Hence:

All of the replicates have 30 columns and 542 rows.

So, 542/1141 sampling location remain after you apply the sampling cutoff on the small map.

So, it seems to me that the issue here is that your large map is too big for R to handle memory-wise...or it just takes way too long to run. At 10mx10m resolution, you would have a map that is 829x1061 pixels, should be able to run without too much headache. Ideally, you'd want to keep as good resolution as you can handle computationally.

You are going to lose spatial complexity but that may be ok given the rest of your design. Remember that this analysis is based on sampling locations. Where exactly the individual falls within that "site" is not stored, we're just recording presence/absence. The movement parameters for your individuals are measured in km, so as a really generous approximation, individuals are using areas that are 200m/10m = 20 pixels x 20 pixels = 400 pixels total at a 10m resolution. Each individual then is going to be "using" an area much, much larger than one sampling location. That's not a problem, pre se, just an important thing to keep in mind in interpretation.

So, I think where I lose you is here:

I don't know how to reduce the dimensions to match something like your sample map while retaining the georefencing.

It seems like you are used to using aggregate() in the raster package. I think you mentioned that you are also comfortable with GIS. Are you having trouble setting your large map to 10m resolution? OR do you have concerns about losing aspects of the biology at that coarser level (e.g. how to summarize the pixel information into larger pixels that retains the essence of whether that pixel represents "good" habitat)?

technoautotroph commented 8 years ago

This is super helpful and should solve most of my problems. Your feedback has helped me to better understand how this works and it is giving me more confidence in my interpretation of things. Hopefully, my learning experience here will translate into an effective manuscript that will help others to customize their scenarios.

I will spend my time learning how to properly reduce the resolution of my larger map and might just go to our GIS department to get this done more quickly.

Despite the size, the larger map produced an output for my encounter history. I notice a few things that I will need to work on, but most importantly the individual activity center locations looks like a random distribution. However, I suspect that salamanders have a clumping distribution. Question: Is there an option to set the distribution to be more clumped?

The following is a background on my methods for creating the maps, which might help you to understand why my maps are the way they are:

I didn't know what size of maps to create when I started this project. I placed a rectangle over my sampling area covering what I thought to be a reasonable home range for the population under investigation. The small map covers an area that is nestled in at the toe of a mountain along a stream valley that likely constrains movement. The larger map sits on top of the extent of the smaller map in considering the possibility of a metapopulation or panmictic exchange structure.

I used Orthophotos and GoogleEarth to fill in the missing pieces. The cut out images were imported into a raster graphics editor - I use Artweaver, which is a less expensive version of photoshop. I created layers and manually drew over the different features what I thought to be a reasonable habitat ranking scale with tone ranging from white to grey to black. Saving the image into a tiff "pixelates" the image, so I had to carefully create masks on each layer to ensure that each layer was exactly white or exactly the selected tone. I imported this image into QGIS and used specified landmarks to georeference the tiff.

This procedure might help to explain why the size of my maps are not following on round numbers. I tried to cut out areas that were more exact, but they came out a little sloppy. I don't think this should matter. I am only a QGIS learner. I have taken university courses in GIS and we have personnel here who are experts in GIS, but I've been trying to learn how solve these problems myself.

Thank you!!

mmellis commented 8 years ago

rSPACE was designed for working with territorial carnivores, so the distribution should be much more regular, rather than randomly distributed. Working with clumped individuals would require even more input parameters, not to mention that the relationship between occupancy and abundance will likely be more complicated (e.g., see Estrada, A., and B. Arroyo. 2012. Occurrence vs abundance models: differences between species with varying aggregation patterns. Biological Conservation 152:37–45.) Right now, we don't have the code to test models for non-territorial species.

technoautotroph commented 8 years ago

Thank you for the feedback. I recommend:

Christianson, D.S. & Kaufman, C.G. (2016). Effects of sample design and landscape features on a measure of environmental heterogeneity. Methods in Ecology and Evolution, 7, 770–782.

technoautotroph commented 8 years ago

I learned how to reduce my map in QGIS - it was much easier than I realized. Thank you again for all your help. I do hope to publish on this and will be in touch. My large map analysis is proceeding quickly and smoothly!! Yay!

Once again - Thank you for all your work.