I was recently linking some VMS and logbook data for a given fleet with the purpose of calculating some CPUEs and in this context I had a closer look on how the function splitAmongPings() behave, and in particular how it prioritizes the allocation of the catch.
It looks like with level = "day" it:
1) allocates the catch among pings matching the day, ICESrectangle, trip ID and vessel.
2) the remaining is allocated to pings matching the ICES rectangle, trip ID and vessel and
3) what cannot be matched on ICES rectangles is split among all pings of the trip.
(which is sequentially repeated without trip ID and vessel ID if conserve = TRUE).
This means that if there is a day and rectangle mismatch between the LB and VMS (after step 1), the catch would then be shared among pings of other days (in case of a multiple day trip) but same rectangle, or ultimately among all the pings of the trip if no rectangle matches.
However, I have the intuition that the ICES rectangle information in the LB may be less reliable than the day of the catch, in which case one may prefer to share the remaining catch of the day among all pings of this day.
For illustration, in the example below (merged aggregated eflalo and tacsatEflalo), because of a mismatch in the ICES rectangle (likely misspelled in the LB), the whole catch of the 20/09 is allocated to the same rectangle on another day (19/09, only day the vessel has actually been fishing in the rectangle 29E9 during this trip):
> LBsub <- aggregate(formula = LE_KG_TOT ~ FT_REF + LE_RECT + LE_CDAT,
+ data = subset(eflalo, FT_REF == "319275689",
+ select = c("FT_REF", "LE_CDAT", "LE_RECT", "LE_KG_TOT")),
+ FUN = sum, na.rm = TRUE)
> VMSsub <- aggregate(formula = LE_KG_TOT ~ FT_REF + LE_RECT + SI_DATE,
+ data = subset(tacsatEflalo, FT_REF == "319275689",
+ select = c("FT_REF", "SI_DATE", "LE_RECT", "LE_KG_TOT")),
+ FUN = sum, na.rm = TRUE)
> merge(x = LBsub, y = VMSsub, all = TRUE,
+ by.x = c("FT_REF", "LE_CDAT", "LE_RECT"),
+ by.y = c("FT_REF", "SI_DATE", "LE_RECT"),
+ suffixes = c(".LB", ".VMS"))
FT_REF LE_CDAT LE_RECT LE_KG_TOT.LB LE_KG_TOT.VMS
1 319275689 17/09/2017 28E7 NA 0
2 319275689 18/09/2017 28E9 4860 4860
3 319275689 19/09/2017 28E9 1188 1188
4 319275689 19/09/2017 29E9 1152 8712 ## <- VMS: same as misspelled rectangle but on another day => where all the unmatched catch ends up.
5 319275689 20/09/2017 29E9 7560 NA ## <- LB with misspelled rect.
6 319275689 20/09/2017 30E9 NA 0 ## <- VMS: where it should end up (same day)... zero catch instead
Another example where one event recorded in what appears to be a misspelled rectangle (27E9, on the 06/10; the vessel has not been fishing at all in it during the trip according to VMS) was allocated among all pings of the trip rather than being kept on that day (1296+2880 >> 3148):
FT_REF LE_CDAT LE_RECT LE_KG_TOT.LB LE_KG_TOT.VMS
1 319816571 03/10/2017 28E9 504 526.3448
2 319816571 04/10/2017 28E9 4716 4984.1379
3 319816571 05/10/2017 28E9 4536 4804.1379
4 319816571 06/10/2017 27E9 1296 NA ## <- LB with misspelled rect. Catch split among all pings of the trip
5 319816571 06/10/2017 28E9 2880 3148.1379 ## <- VMS: catch does not add up to 4176t.
6 319816571 07/10/2017 28E9 3852 4120.1379 ## <- VMS: all other days end up with more catch allocated.
7 319816571 08/10/2017 28E9 4248 4449.1034
I suppose that whether this is usually better than ensuring the conservativeness of the catch per day can be subject to debate but I think it is not in my particular situation. So I have written a function splitAmongPings3, based on the original one, with an extra step (between the original first and second steps) for the allocation of the catch of the day that was not matched on the rectangle(s) to that day. The matching on rectangle only is still done afterwards, for those days in the LB with no matching VMS.
I have limited this extra step for entries with at least a vessel ID (when conserve = TRUE) as I reckon that if no vessel information can be matched, the ICES rectangle is still the best spatial information we have (several vessels may be all over the place on a same day).
My two former examples now become respectively:
FT_REF LE_CDAT LE_RECT LE_KG_TOT.LB LE_KG_TOT.VMS
1 319275689 17/09/2017 28E7 NA 0
2 319275689 18/09/2017 28E9 4860 4860
3 319275689 19/09/2017 28E9 1188 1188
4 319275689 19/09/2017 29E9 1152 1152
5 319275689 20/09/2017 29E9 7560 NA ## <- LB: misspelled rect.
6 319275689 20/09/2017 30E9 NA 7560 ## <- VMS: catch conserved for the day
which seems a bit tidier and is conservative of the catch for the day.
The function is used as the original and simply has an extra optional parameter priorityDay (= FALSE: default is to behave like the original function, and priorityDay = TRUE only has an effect if level = "day"). The overall catch allocated to VMS data is consistent with the original function when conserve = TRUE (I think it should be regardless), only the way it is allocated among pings is different.
Do you think this would be a more sensible approach for allocating the catch among pings for the general case? Or that the user should be given the choice?
If you are interested, I can send a pull request to incorporate the code in the package.
Hi,
I was recently linking some VMS and logbook data for a given fleet with the purpose of calculating some CPUEs and in this context I had a closer look on how the function
splitAmongPings()
behave, and in particular how it prioritizes the allocation of the catch. It looks like withlevel = "day"
it:1) allocates the catch among pings matching the day, ICESrectangle, trip ID and vessel. 2) the remaining is allocated to pings matching the ICES rectangle, trip ID and vessel and 3) what cannot be matched on ICES rectangles is split among all pings of the trip.
(which is sequentially repeated without trip ID and vessel ID if
conserve = TRUE
).This means that if there is a day and rectangle mismatch between the LB and VMS (after step 1), the catch would then be shared among pings of other days (in case of a multiple day trip) but same rectangle, or ultimately among all the pings of the trip if no rectangle matches. However, I have the intuition that the ICES rectangle information in the LB may be less reliable than the day of the catch, in which case one may prefer to share the remaining catch of the day among all pings of this day.
For illustration, in the example below (merged aggregated eflalo and tacsatEflalo), because of a mismatch in the ICES rectangle (likely misspelled in the LB), the whole catch of the 20/09 is allocated to the same rectangle on another day (19/09, only day the vessel has actually been fishing in the rectangle 29E9 during this trip):
Another example where one event recorded in what appears to be a misspelled rectangle (27E9, on the 06/10; the vessel has not been fishing at all in it during the trip according to VMS) was allocated among all pings of the trip rather than being kept on that day (1296+2880 >> 3148):
I suppose that whether this is usually better than ensuring the conservativeness of the catch per day can be subject to debate but I think it is not in my particular situation. So I have written a function
splitAmongPings3
, based on the original one, with an extra step (between the original first and second steps) for the allocation of the catch of the day that was not matched on the rectangle(s) to that day. The matching on rectangle only is still done afterwards, for those days in the LB with no matching VMS. I have limited this extra step for entries with at least a vessel ID (whenconserve = TRUE
) as I reckon that if no vessel information can be matched, the ICES rectangle is still the best spatial information we have (several vessels may be all over the place on a same day).My two former examples now become respectively:
and
which seems a bit tidier and is conservative of the catch for the day.
The function is used as the original and simply has an extra optional parameter
priorityDay
(= FALSE
: default is to behave like the original function, andpriorityDay = TRUE
only has an effect iflevel = "day"
). The overall catch allocated to VMS data is consistent with the original function whenconserve = TRUE
(I think it should be regardless), only the way it is allocated among pings is different.Do you think this would be a more sensible approach for allocating the catch among pings for the general case? Or that the user should be given the choice? If you are interested, I can send a pull request to incorporate the code in the package.
Best wishes, Yves