Open wallstein opened 2 weeks ago
Hi Jonas,
This is a great catch and will help us improve the package, though I'm very sorry you had to discover the issue and endure whatever headache came as a result.
For whatever reason, I mistakenly wrote the code such that the lower bound and upper bound are both exclusive. We are close to issuing an update to replace all of the raster functions in the package with terra packages, and will include this correction in that update.
If you would like, I can send you a corrected staggregate_bin() function (minus the terra changes) that should work until we can finish reviewing the full update. Thank you again for catching this and for providing such a clear explanation of the problem and things you tried - it made it very straightforward to address.
Best, Tyler
Edit: I just opened up github and saw the actual numbers. This may not be the issue unless you have a lot of temperature values that are exactly 30. Is this the case?
On Tue, Nov 5, 2024 at 5:21 PM Jonas Wallstein @.***> wrote:
Hi,
First of all thanks for creating such a straightforward-to-use package! I have an issue with staggregate_bin() and was hoping you could help me solve it.
Description of the task: I have daily temperature data and want to split it into bins. For simplicity only two bins: below and above 30 degree celcius. I have polygons in the form of California Counties and want to aggregate to this spatial level, weighted by population (secondary_raster). The time is either kept at daily level or aggregated to monthly.
Problem: In the dataframe returned by staggregate_bin() I would assume that the bins sum to 1 for the daily level and the number of days in a month when aggregated to the monthly level. However, the bin variables ("bin_ninf_to_30" & "bin_30_to_inf") sum below 1 (or 28/30/31 depending on the month) when on the daily (monthly) level. I attached a picture from of the staggregate_bin() output. Do you have an idea what could cause this?
Troubleshooting: I already checked that there are no missing values in the rasters in the areas corresponding to the polygons. The weights also sum to one for each polygon in the overlay_weights, secondary_weights & overlay_secondary_weights.
Any help would be greatly appreciated :)
Best, Jonas
target_geometry <- counties(state = "CA", class = "sf") %>% st_transform(crs = 4326) environmental_raster_imputed <- focal(environmental_raster, 31, "modal", na.policy="only", expand = T) secondary_raster_imputed <- subst(secondary_raster_resampled, NA, 0)
Compute the overlay weights for the target_geometry and raster grid
overlay_weights <- overlay_weights(polygons = target_geometry, # Input raster data (temperature) polygon_id_col = "spatial_ID", # Column in target_geometry representing unique ID grid = environmental_raster_imputed) # Environmental raster (imputed)
Compute secondary weights (population weights) by aligning the secondary raster with the temperature raster
secondary_weights <- secondary_weights(secondary_raster = secondary_raster_imputed, # Imputed secondary raster grid = environmental_raster_imputed, # Environmental raster to align with extent = "full") # Use the full extent of the Environmental raster
Compute the overlay weights for the polygons and the raster grid, including the population weights
overlay_secondary_weights <- overlay_weights(polygons = target_geometry, # Polygon data for California counties polygon_id_col = "spatial_ID", # Column in target_geometry representing unique ID grid = environmental_raster_imputed, # Temperature raster (imputed) secondary_weights = secondary_weights) # Use the secondary (population) weights
Aggregate the data by creating a bin for temperatures above 30 degrees Celsius
binned_output_daily <- staggregate_bin(data = environmental_raster_imputed, # Input raster data (temperature) overlay_weights = overlay_secondary_weights, # Population-weighted overlay weights time_interval = "1 day", # Daily interval for binning daily_agg = "none", # No daily aggregation, keep daily values time_agg = "day", # Aggregate results for each day bin_breaks = c(30)) # Bin for temperatures above 30°C
Screenshot.2024-11-05.at.5.17.22.PM.png (view on web) https://github.com/user-attachments/assets/fb9c8099-0d12-4644-9d6e-9b9d319b696c
— Reply to this email directly, view it on GitHub https://github.com/tcarleton/stagg/issues/37, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYKEE2BXHTFQOEPRTUPE4EDZ7FVKPAVCNFSM6AAAAABRHYGOU6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTMOBRGY3DKMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Tyler,
Thanks for getting back to me so quickly! There are no values that are exactly 30. A few values are between 29.9 and 30.1 but that wouldn't impact the bounds, right? Do you have another idea what could cause this?
Best, Jonas
Hi Jonas,
Yeah I believe we can rule out that issue. That the incorrect sum is always the same number leads me to believe this is related to weights, but I see you've already made sure those sum to one.
My other idea is that this could somehow be related to the fact that your environmental data is daily (though it's unclear to me how). We originally wrote the package for hourly data and only recently added the start_date and time_interval arguments for increased flexibility, so it's possible this had some unforeseen impacts. However, a quick check of some of the fractions near to .3792767 you'd expect to pop up if we were accidently excluding temporal layers (9/24, 12/31, 11/30, etc.) doesn't suggest this is the issue. If this decimal jumps out at you as significant in any way I'm not thinking of let me know.
Is your raster data publicly available and if so could you point me to where I can get it? If not, could you help me mimic its structure?
Thanks, Tyler
Edit: I've done some more digging, and I think I've figured it out, though I'd like to check before taking the time to write up an explanation and solution. If I'm correct, it's only this particular polygon that's summing to that number (and the others all sum to their own number consistently), and the raster you're working with has cells with a width (xres) that, when expressed as a decimal point, has many digits following the decimal (the data I've been working with has an xres of 0.04166675, for instance). Is this the case?
Hi Tyler,
Yes you're absolutely right, the sum is consistent for a polygon but changes across polygons. I didn't notice that before, great observation! And yes the resolution is exactly such a number:
environmental_raster: class : SpatRaster dimensions : 231, 249, 457 (nrow, ncol, nlyr) resolution : 0.04166668, 0.04166668 (x, y)
Let me know if it's helpful for me to send my scripts & data. Also thanks again so much with your help on this!!
Best, Jonas
Hi Tyler,
I thought catching the error might be easier with my code and data so I attached it. I wrote the code in R Markdown and included the HTML output for better readability. I left out some of the raw data as it was too big, but included all the processed data that I'm using for the aggregation. Let me know if you have any questions or if something is missing.
Thanks so much for your help so far, that was already very useful!
Best, Jonas
Hi,
First of all thanks for creating such a straightforward-to-use package! I have an issue with staggregate_bin() and was hoping you could help me solve it.
Description of the task: I have daily temperature data and want to split it into bins. For simplicity only two bins: below and above 30 degree celcius. I have polygons in the form of California Counties and want to aggregate to this spatial level, weighted by population (secondary_raster). The time is either kept at daily level or aggregated to monthly.
Problem: In the dataframe returned by staggregate_bin() I would assume that the bins sum to 1 for the daily level and the number of days in a month when aggregated to the monthly level. However, the bin variables ("bin_ninf_to_30" & "bin_30_to_inf") sum below 1 (or 28/30/31 depending on the month) when on the daily (monthly) level. I attached a picture from of the staggregate_bin() output. Do you have an idea what could cause this?
Troubleshooting: I already checked that there are no missing values in the rasters in the areas corresponding to the polygons. The weights also sum to one for each polygon in the overlay_weights, secondary_weights & overlay_secondary_weights.
Any help would be greatly appreciated :)
Best, Jonas