Closed hongyeehh closed 6 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.41%. Comparing base (
59e3e1c
) to head (9abd8fd
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
A major change to the _generate_triplegs_overlap_staypoints()
function is to assign pfs geometries to triplegs instead of the defined initially staypoint geometries (basically removing line 439-445 in trackintel/preprocessing/positionfixes.py
). This ensures consistency between the two cases in generate_triplegs()
(the testing function test_stability()
).
@munterfi, does this change affect your use cases or has any consequences for the original code? The tests seem to work fine after corresponding edits.
Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍
Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.
What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?
In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.
What do you think?
Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍
Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.
What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?
In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.
What do you think?
Thanks for the explanation! My primary concern is ensuring consistency in the code, e.g., obtaining the same tripleg generation results for cases 1 and 2.
Assigning pfs geom is probably the simplest solution, as it preserves the one-to-one correspondence of the geometries of pf-sp and pf-tpl, and does not lead to tpl table depending on the sp generation result. I agree that assigning sp geometries to triplegs might make more sense, considering the real-world GPS noise. This is more of a design choice, and I am fine with both options as long as the code makes the test_stability()
pass :).
Maybe @NinaWie has a preference?
I agree that there needs to be consistency between the cases whether sp are provided as input or not. But I also agree that it would be nicer to use the actual staypoint geometry for the overlap
method, since the staypoint geometry should indeed overlap with the tripleg geometry in this case. To implement this, we would have to recompute the staypoint centroid from the pfs, right? So within the tripleg generation functions, in the case where no staypoints are supplied, we need to use the staypoint_id column in pfs and use it to compute the staypoint geometry? This is a bit ugly since it should be done in the staypoint generation. The other workaround would be to add an assertion statement that throws an error when you try to use the overlap_staypoints
method in the tripleg generation without supplying staypoints.
Thank you for the suggestions!
I support excluding the case 2 entirely if the overlap_staypoints
method is chosen.
A general input regarding the application scenario for case 2: Does this case imply that staypoints can be sourced differently, i.e. skipping the generate_staypoints
step?
Given the high complexity and nesting within the generate_triplegs
method, would it then make sense to extract case 2 into a separate function, e.g. assign_pfs_to_sps(pfs, sp)
, and set the staypoint_id
in the positionfixes as a prerequisite for the generate_triplegs
function?
Even though this is a breaking change for users, it would reduce the input checks and variants that need to be managed, and assign a single responsibility to generate_triplegs
.
In the new version, an error will be raised if pfs
with no staypoint_id
is provided to the function, such that we only support case 1 for overlap_staypoints.
I like the idea of removing case 2 preprocessing from the function, which we can implement in future PRs. At the moment, I added a deprecation warning for the users. Since it is not important the print_progress
input argument is removed.
Following #607, this PR does the following:
test_stability()
, to ensure consistency between two cases