mie-lab / trackintel

trackintel is a framework for spatio-temporal analysis of movement trajectory and mobility data.
MIT License
198 stars 50 forks source link

DOC/TST: generate tripleg #609

Closed hongyeehh closed 6 months ago

hongyeehh commented 6 months ago

Following #607, this PR does the following:

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.41%. Comparing base (59e3e1c) to head (9abd8fd).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #609 +/- ## ========================================== - Coverage 93.44% 93.41% -0.03% ========================================== Files 33 33 Lines 2076 2082 +6 Branches 367 371 +4 ========================================== + Hits 1940 1945 +5 Misses 126 126 - Partials 10 11 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hongyeehh commented 6 months ago

A major change to the _generate_triplegs_overlap_staypoints() function is to assign pfs geometries to triplegs instead of the defined initially staypoint geometries (basically removing line 439-445 in trackintel/preprocessing/positionfixes.py). This ensures consistency between the two cases in generate_triplegs() (the testing function test_stability()).

@munterfi, does this change affect your use cases or has any consequences for the original code? The tests seem to work fine after corresponding edits.

munterfi commented 6 months ago

Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍

Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.

What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?

In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.

What do you think?

hongyeehh commented 6 months ago

Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍

Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.

What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?

In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.

What do you think?

Thanks for the explanation! My primary concern is ensuring consistency in the code, e.g., obtaining the same tripleg generation results for cases 1 and 2.

Assigning pfs geom is probably the simplest solution, as it preserves the one-to-one correspondence of the geometries of pf-sp and pf-tpl, and does not lead to tpl table depending on the sp generation result. I agree that assigning sp geometries to triplegs might make more sense, considering the real-world GPS noise. This is more of a design choice, and I am fine with both options as long as the code makes the test_stability() pass :).

Maybe @NinaWie has a preference?

NinaWie commented 6 months ago

I agree that there needs to be consistency between the cases whether sp are provided as input or not. But I also agree that it would be nicer to use the actual staypoint geometry for the overlap method, since the staypoint geometry should indeed overlap with the tripleg geometry in this case. To implement this, we would have to recompute the staypoint centroid from the pfs, right? So within the tripleg generation functions, in the case where no staypoints are supplied, we need to use the staypoint_id column in pfs and use it to compute the staypoint geometry? This is a bit ugly since it should be done in the staypoint generation. The other workaround would be to add an assertion statement that throws an error when you try to use the overlap_staypoints method in the tripleg generation without supplying staypoints.

munterfi commented 6 months ago

Thank you for the suggestions!

I support excluding the case 2 entirely if the overlap_staypoints method is chosen.

A general input regarding the application scenario for case 2: Does this case imply that staypoints can be sourced differently, i.e. skipping the generate_staypoints step?

Given the high complexity and nesting within the generate_triplegs method, would it then make sense to extract case 2 into a separate function, e.g. assign_pfs_to_sps(pfs, sp), and set the staypoint_id in the positionfixes as a prerequisite for the generate_triplegs function?

Even though this is a breaking change for users, it would reduce the input checks and variants that need to be managed, and assign a single responsibility to generate_triplegs.

hongyeehh commented 6 months ago

In the new version, an error will be raised if pfs with no staypoint_id is provided to the function, such that we only support case 1 for overlap_staypoints.

I like the idea of removing case 2 preprocessing from the function, which we can implement in future PRs. At the moment, I added a deprecation warning for the users. Since it is not important the print_progress input argument is removed.