kaijagahm / vultureUtils

Utility functions for working with vulture data
Other
4 stars 0 forks source link

Issue with consecEdges #108

Closed kaijagahm closed 1 year ago

kaijagahm commented 1 year ago

If there were duplicate edges between ID1 and ID2 in a given time period, consecEdges would not function correctly, since it was based on counting numbers of rows. Discovered this when trying to use a higher timeThreshold, which will necessarily return multiple interactions per time period per dyad. When I did this deliberately, consecEdges had little to no effect on the number of edges returned, which didn't seem normal. I dug into it, and realized that because of how I had written the code with counting rows, consecEdges was counting multiple interactions in the same timegroup as contributing to the occurrence of an edge in multiple consecutive timegroups, which is an assumption that only holds in the very particular case where the timeThreshold matches the gps fix rate.

kaijagahm commented 1 year ago

I changed how the code is written, first getting the unique dyad-timegroup combos, then filtering down to dyads that occurred in multiple consecutive timegroups, then joining back to the full data to preserve multiple edges. This will be important because it might mean that some "duplicate" interactions under the previous 10-minute fix rate were being included erroneously when they should have been excluded due to the consecThreshold. Hopefully the impact won't be too huge.

kaijagahm commented 1 year ago

Explanation as emailed to Elvira:

In getFeedingEdges (and getFlightEdges), we have a parameter called consecThreshold. This refers to the number of consecutive time periods in which individuals have to interact in order for the interaction to be counted. Normally, for the vulture interaction networks, we have a time period of 10 minutes, and we ask to consider only interactions that happen in at least 2 consecutive time periods.

The way that I wrote the code to filter for this is complicated. I used a shortcut that works mostly fine for the normal use of the data (10 minute intervals, consecThreshold = 2). But it turns out it doesn't work anymore when individuals interact more than once in a single time period. Now, with the 10 minute intervals, this shouldn't happen.* But when we use a longer time interval (like you're trying to do now), we do have multiple interactions in the same time interval. (That's the whole point.)

So, I had to change the code a little bit to make the consecThreshold filtering work properly when there are multiple interactions.

You might be thinking "Why do we care about this? We only have a single 14-day window at a time, not multiple 14-day windows!" Of course, that's right. For your use case, it will probably work fine to not update the package and to just set consecThreshold to 1, since you only have a single 14-day time period.

BUT, you should still update the package, because this fix might change some of the other co-feeding analyses. Why? Well, remember the warning/error that I've been telling you to just ignore up until now? The one that says "found duplicate id in a timegroup and/or splitBy - does your group_times threshold match the fix rate?" That warning is generated even when we use the normal 10 minute threshold because the GPS tags don't always generate fixes exactly 10 minutes apart. Sometimes it's more like 9.8 minutes, or 10.1 minutes. As a result, it is sometimes possible to have a 10 minute window that contains two different GPS locations for the same vulture. I don't think this happens frequently, and Noa and Nitika had told me before to just not worry about it. But in this specific case it does matter. Because if vulture A gets recorded twice in timegroup 1, and vulture B gets recorded once in timegroup 1, we could end up with two interactions between A and B in timegroup 1. And then, as I said before, the code that I wrote doesn't work properly when individuals interact more than once in a single timegroup. Because of the shortcut I took, it is possible that these duplicate interactions could have been left in the dataset when they would have otherwise been removed due to the consecThreshold parameter.

I think I've already over-explained this, so I'm going to stop. To summarize: the change that I made is actually not very important to your current problem of how to use getFeedingEdges to look at interactions over a longer time window. But it might affect your other code (when you use getFeedingEdges with a time threshold of 10 minutes), so you should update the package and re-run the previous analyses just in case. I don't think it will have a big effect, because those duplicate edges are probably pretty rare. But we should be careful.

Once you've updated the package, you can use the getFeedingEdges code similarly to how we did getRoostEdges before:

# assuming you have data called mydata_1
# load in your roost polygons; here i'll pretend that you called the object "roostPolygons"
roostPolygons <- sf::st_read("data/[ROOST POLYGON FILE]")

edges <- getFeedingEdges(dataset = mydata_1, roostPolygons = roostPolygons, consecThreshold = 1, distThreshold = 25, timeThreshold = "20 days", idCol = "id", return = "edges") # 20 days is longer than we need, but just to be safe

# same as with roost edges, filter to remove any duplicates (I think this step is unnecessary, but it can't hurt)
edges <- edges %>%
  mutate(ID1 = as.character(ID1),
         ID2 = as.character(ID2)) %>%
  filter(ID1 < ID2) # removes duplicates and self edges

# Count number of interactions to give us the edge weights
forIgraph <- edges %>%
  group_by(ID1, ID2) %>%
  summarize(weight = n())

A couple of important points you should be aware of when you use getFeedingEdges this way:

  1. Same as with getRoostEdges: you need to use return = "edges" instead of return = "sri". I don't know what will happen if you use SRI but it probably won't work and it definitely won't make sense.
  2. For the timeThreshold, I chose a number of days greater than the number you actually have, just to be safe.
  3. It's really important to set consecThreshold to 1, instead of 2 (which is what we normally do for the interaction networks). Under the new code, if you set consecEdges to 2, you'll end up with 0 edges, because there is no such thing as two consecutive 14-day (or 20-day) periods inside a 14-day period.

The only change I made is in the consecEdges function, which is used by getEdges, getFeedingEdges, and getFlightEdges. You can see exactly what I changed here: https://github.com/kaijagahm/vultureUtils/pull/109/commits/9f9d1e863a98483e0aeecb9a11d1b7e22b80ad92.