ocean-tracking-network / glatos

9 stars 3 forks source link

Residency Index Constraints #129

Open jdpye opened 2 years ago

jdpye commented 2 years ago

In GitLab by @morganpiczak on Jan 23, 2021, 18:07

Trying to calculate residency index per season (across 10 years of data and 40 seasons).

When using detection_events, if you choose False (to carry over extra data such as seasons), it does not produce the columns needed for residency index (i. e. first detection and last detection). So it would be good to be able to keep desired columns, while also having the required columns for residency index.

Additionally, currently as it is written, the vignette only states that animal_id, detection_timestamp, mean_lat and mean_long are required, which I think is incorrect (as it needs first and last detection from detection events).

It would be extremely helpful to be able to have residency calculated by a specified parameter (from a carried over column from detection events) such as season, species or year. As the function currently stands, it calculates residency index for all stations with detections, but does not carry forward the stations with no detections (which would be essential for a habitat selection analysis for example).

jdpye commented 2 years ago

In GitLab by @jdpye on Jan 25, 2021, 24:23

assigned to @ryangosse

jdpye commented 2 years ago

In GitLab by @jdpye on Jan 25, 2021, 24:30

I've been thinking about how to generalize this use case, to keep the function from having to take on many more assumptions, and I'm currently thinking I'd prefer seeing a user-defined subsetting strategy (i.e. using a groupby and group_walk calling the RI function) rather than handling the group/splitting logic inside the function.

Definitely could use a flag that preserves the 0 values in the return from RI (and maybe any other possibly interesting column values that could survive the RI groupings), that shouldn't be too heavy a lift here, and i think we can design a good workflow example once we have that.

Expanding/fixing the doc is important too! no surprises for the users that way.

jdpye commented 2 years ago

In GitLab by @chrisholbrook on Jan 25, 2021, 08:16

I agree on both suggestions; current requirement of compressed detection_events input object could be relaxed. Perhaps treat existing function as a wrapper for exported lower-level residence_index that assumes input is a single group. The wrapper (existing residence_index) should easily be expanded to recognize uncompressed detection events and compress if needed.

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Feb 4, 2021, 15:59

Currently, due to this constraint it is easier to use dplyr to calculate residency (RI) and roaming indices (ROI) as you can use group_by() function to carry over any other meta data such as season, tagging location/basin location, receiver groupings ect. Something like a group_by() argument within this would be super helpful for the function to have a wider application. Also, I'm curious on your guys thoughts on a 0 value for a RI or ROI as a 0 would indicate a fish was not resident/roam or even detected in that location. I guess I'm confused as to what a 0 value for a RI or ROI represents? Does a 0 indicate an absence of a fish?

jdpye commented 2 years ago

In GitLab by @chrisholbrook on Feb 4, 2021, 18:08

@benjaminhlina A value RI = 0 would result if a fish was never detected at a location, so I would interpret this as complete absence. It is possible that some of the methods would return a value zero if all visits to a site were masked by longer visits to another site during the same time interval (e.g., if binning by day and only using the most prominent site on that day).

The main limitation with calling the current glatos::residence_index from a dplyr::group_by is that residence_index does not include the grouping column in the return object, so some extra handling is required. If you are doing this and have some example code to share, a snippet would be appreciated. I like the idea of adding flexibility to do both: carry through any grouping variables from group_by method and allow grouping within residence_index on variables other than location (including multiple variables/columns).

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Feb 4, 2021, 21:33

@chrisholbrook okay, that makes sense so it be becomes a absence value similar to if I was solely looking at presence absence of a fish. The method you describe makes sense as well.

As far as the using dplyr::group_by() I currently don't combine glatos::residence_index() function and dplyr::group_by(), I solely use dplyr to do this as it seemed easier than the extra handling that would be required by using glatos. Quick off the top of my head instead of having a group_by argument one could use either a for loop or functions out of purrr to iterate over the dataframe to create RI dataframes that would be RI by season. You could have the dataframes be named the season and then afterwards create a seasonal column and merge all of them together for further analysis and plotting. Using the example walleye data in glatos I just wrote a ~250 line script that has how I would use dplyr. Would you like me to share it in here via reprex or send you the script?

jdpye commented 2 years ago

In GitLab by @jdpye on Feb 4, 2021, 21:49

A snippet might be the right way to do it: https://gitlab.oceantrack.org/-/snippets/new

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Feb 4, 2021, 22:00

@jdpye I like the idea but the link you you shared comes up with a 404 error. I did find the gitlab's info on snippets but I'm not sure if I have the right permissions to create one as I can't seem to find make a new snippet.

jdpye commented 2 years ago

In GitLab by @chrisholbrook on Feb 5, 2021, 09:24

@benjaminhlina I am not familiar with snippets but will defer to @jdpye preference on this and for guidance to make it work. Whether a snippet or reprex, I do like the idea of posting your sample code here if it is related to the OP.

jdpye commented 2 years ago

In GitLab by @jdpye on Feb 5, 2021, 09:38

Didn't realize i'd restricted snippet-making, i'll try and fix that for you, i find they're second best to a pull request. Definitely good for posting a link in tickets.

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Feb 5, 2021, 09:40

@jdpye Awesome, I'll wait for you to try to give me access. If not no worries and I'll post the reprex.

jdpye commented 2 years ago

In GitLab by @jdpye on Feb 5, 2021, 09:49

There we go, should work now. We disabled snippets for External users since it was a huge spam vector for bots advertising gambling sites. Switched you to Internal.

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Feb 5, 2021, 11:16

@jdpye thanks for giving me permission. I've created snippet 231. Hope this helps and works for sharing the method I currently use.

jdpye commented 2 years ago

In GitLab by @benjaminhlina on Mar 18, 2021, 18:57

I have added an amended comment to snippet 231 which uses the function complete() from tidyr to create 0 values for RI when a fish was not heard at all and considered absent.