Closed kaijagahm closed 1 year ago
Seems like a NULL default would also be appropriate for edge_dist()
, since its docs also mention that the timegroup
arg is optional.
Hello @kaijagahm,
Good catch with the inconsistencies in the documentation - timegroup is described as being optional for group_pts
in the documentation but you are right it is currently required (group_pts.R#L116-L118). The reason we would want timegroup to be required is a safeguard - if we accidentally gave group_pts
a data.table of thousands of rows to calculate pairwise distances, we could quickly hit an upper limit to user's machines.
I'll get back to you shortly, I'm working on clarifying this in the docs and fixing a related issue. There is a simple workaround for already grouped data, I'll share too!
Ok first up - the workaround:
# === Timegroups workaround -----------------------------------------------
# Alec L. Robitaille
# Packages ----------------------------------------------------------------
library(spatsoc)
library(data.table)
# Data --------------------------------------------------------------------
# Read example data
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
# Prepare variables -------------------------------------------------------
# Cast the character column to POSIXct
DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
# Group times -------------------------------------------------------------
# Typically, we'd use group_times for temporal grouping
# group_times(DT, datetime = 'datetime', threshold = '20 minutes')
# Instead of using group_times, we'll make our own time group
# For example, say we wanted to group based on the day of year
# (note this can be accomplished by group_times with a 1 day threshold too)
# Add the day of year as a column
DT[, doy := yday(datetime)]
# Select only the first row for each individual on each day,
# and save this to a new data.table "DT_sub"
# (Just for this example, where we need one observation per individual per timegroup)
DT_sub <- DT[, first(.SD), by = .(ID, doy)]
# Spatial groups ----------------------------------------------------------
# With custom timegroup, in this case "doy"
group_pts(
DT_sub,
threshold = 5,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'doy'
)
#> ID doy X Y datetime population group
#> 1: A 306 715851.4 5505340 2016-11-01 00:00:54 1 1
#> 2: A 307 715558.2 5505518 2016-11-02 00:00:54 1 2
#> 3: A 308 714955.6 5505223 2016-11-03 00:01:42 1 3
#> 4: A 309 713800.1 5505434 2016-11-04 00:01:23 1 4
#> 5: A 310 715091.4 5505312 2016-11-05 00:01:44 1 5
#> ---
#> 1196: J 55 697605.8 5511058 2017-02-24 00:00:48 1 1165
#> 1197: J 56 697783.8 5511689 2017-02-25 00:00:55 1 1166
#> 1198: J 57 697835.3 5510694 2017-02-26 00:00:48 1 594
#> 1199: J 58 700692.0 5508661 2017-02-27 00:00:54 1 1167
#> 1200: J 59 700684.6 5508745 2017-02-28 00:00:48 1 1168
Created on 2023-02-01 with reprex v2.0.2
So we are using a custom field, in this case "doy", and since the example data used has multiple observations per individual per day, we subset only the first of each day. This kind of workaround might be useful for focal observations where we have observation ids or something similar. Out of interest, what's your use case that already has time groups defined?
Second, the documentation inconsistencies with the functions checks: #46
I changed the manual to clarify that timegroups is strongly recommended for group_pts
, edge_nn
and edge_dist
but doesn't necessarily need to come directly from group_times
. Let me know if you think that is clear, and reasonable (given the above workaround as well)
https://robitalec.r-universe.dev/spatsoc/doc/manual.html#group_pts
Lastly, if you really wanted to ignore timegroups or set it to a ~NULL equivalent, you could force spatsoc
to consider all rows in the same timegroup using a constant across all rows. I'm not really sure where this would be relevant, but in case it is, here I make up some fake data and group it, using a timegroup set to 1 for all rows.
# === One timegroup -------------------------------------------------------
# Alec L. Robitaille
# Load packages -----------------------------------------------------------
library(data.table)
library(spatsoc)
# Fake data ---------------------------------------------------------------
DT <- CJ(id = LETTERS[1:5],
x = 1,
y = 1)
DT[, x := x + sample(seq.int(5), .N, replace = TRUE)]
DT[, y := y + sample(seq.int(5), .N, replace = TRUE)]
print(DT)
#> id x y
#> 1: A 2 6
#> 2: B 5 6
#> 3: C 6 3
#> 4: D 2 6
#> 5: E 2 6
# Set timegroup -----------------------------------------------------------
# Set a timegroup for all rows to the same value
DT[, timegroup := 1]
# Group points ------------------------------------------------------------
group_pts(
DT,
threshold = 1,
id = 'id',
coords = c('x', 'y'),
timegroup = 'timegroup'
)
#> id x y timegroup group
#> 1: A 2 6 1 1
#> 2: B 5 6 1 2
#> 3: C 6 3 1 3
#> 4: D 2 6 1 1
#> 5: E 2 6 1 1
Created on 2023-02-02 with reprex v2.0.2
Let me know how this sounds and if it works for your particular use case. I really appreciate you opening the issue and highlighting this inconsistency!
(Edit: missed library(spatsoc)
call in reprex)
Closing this but please let me know if you have other comments or concerns and we can reopen
Thanks! Oops, I had missed the part at the end where you asked me to let you know if it worked. I think I've got it working now, and I appreciate very much your thorough response.
Hi all--thanks for this great package! I'm making extensive use of it in my analysis of vulture GPS tracking data.
For most of my data, I use the recommended workflow, grouping the data first by time and then by space, using
group_times
and thengroup_pts
.But right now I have a special case where I would like to just use
group_pts
, since my data are already split by date and I would like to end up with single-day groups. There is no full timestamp data, just dates.The documentation for
group_pts()
says this should be possible:But when I try to use the function without specifying a value for the
timegroup
argument, there is no default value, so I get the error messageI played around, and it looks like the function operates as expected if I manually pass
timegroup = NULL
. But I don't think I should have to do this: iftimegroup
is truly optional, then NULL should be the default. By extension, I think this part of thegroup_pts
code should be removed:I know I haven't provided a full reproducible example here--I can come up with one if needed, but this seems like a pretty straightforward case of just needing to change the default value for an argument. I'm basically prepared to submit a PR, but wanted to run this by you in case there's a reason I'm missing as to why we can't allow
timegroup = NULL
. (And if there is some such reason, then the doc should be updated to no longer say thattimegroup
is optional.Thanks for your consideration!