Closed isVoid closed 2 years ago
I don't like "generic". What are the functions that fall within generic? Perhaps we can come up with some categories.
Why doesn't each function get it's own .py file like we do in C++? If that's just "Pythonic", let me know.
Because "pythonic" :). But that would be an over-simplification. Python files are actually modules. In c++ analogy, it's a combination of header files, namespaces and preprocessing macros (and probably more). A method func
defined in func.py
will need to be accessed via func.func()
. Some preprocessing may be done when python interpreter load the module and inject functions to objects at runtime, like: https://github.com/rapidsai/cudf/blob/c19c8c9406262f16addf2080096338e8fe4a2ba9/python/cudf/cudf/core/dataframe.py#L6871-L6893. These are the reasons why we want to group methods of similar types in the same module.
cc python guru @shwina
So how does this affect how one imports, e.g. derive_trajectories()
?
Does it become import cuspatial.core.trajectory.generic.derive_trajectories
?
Why not eliminate "core" and "generic" from this so we can just have import cuspatial.trajectory.derive_trajectories
?
A folder is a python package. Methods in the modules can be "exported" by the package by defining them in __init.py__
. To write import cuspatial.trajectory.derive_trajectories
, you would just need to from .generic import derive_trajectories
in trajectory.__init__.py
.
I don't like "generic".
Me neither, for concrete example, there are these methods in spatial.generic.py
def directed_hausdorff_distance(xs, ys, space_offsets):
def haversine_distance(p1_lon, p1_lat, p2_lon, p2_lat):
def lonlat_to_cartesian(origin_lon, origin_lat, input_lon, input_lat):
def polygon_bounding_boxes(poly_offsets, ring_offsets, xs, ys):
def polyline_bounding_boxes(poly_offsets, xs, ys, expansion_radius):
def pairwise_linestring_distance(offsets1, xs1, ys1, offsets2, xs2, ys2):
I think they generally fall into distance
, projection
and bounding_box
categories. But these seems like a sub category when compared to indexing
and join
. I can definitely be persuaded other ways.
I think distance
, projection
, bounding
, indexing
and joining
are all on the same footing and should be at the same level.
"joining" is a bit weird, but it grammatically matches. Alternative is infinitive verb form measure
, project
, bound
, index
, join
.
In our meeting we decided that interpolate
module should be refactored as generic interpolate method for both spatial and time series. And that trajectory package should be removed.
I think we decided that interpolate
should be independent of trajectory
. I did not hear the part where trajectory
should be removed -- there's still useful functionality for trajectories (and RFEs for more).
I did not hear the part where trajectory should be removed -- there's still useful functionality for trajectories (and RFEs for more).
Only the package - trajectory/
is removed. The functions are "flattened" into trajectory.py
module.
I am now having second thoughts on my "infinitive" suggestion, mostly because "measure" is vague where "distance" was quite clear. You can measure a lot of quantities besides distance. I think the original names were better -- do you feel strongly about one or the other? Very sorry for flipping.
do you feel strongly about one or the other
Not really. I agree that distance
is clearer as it's frequently used in literature. I also think it's ok to have exception for joining
to be kept in verb form join
because people are familiar with it's usage.
Then I would just go with distance
, projection
, bounding
, indexing
, join
. (the original)
Motivation
Currently cuspatial's file structure is mostly flat, with two modules whose name have overlapping meanings:
core
andgeometry
. We should refactor the structure of the library so that the physical layout of the folder structure corresponds to the extent of the library, as described in the leading section of docs. This gives developers a clearer idea what's within the problem boundary of cuspatial, and where to dedicate their contributions to.Library Design
In general, cuspatial separates compute API and geodataframe APIs. The compute APIs are broadly organized by the data type it operates on (Spatial or Trajectory) and the computation type (generic, indexing, join, filter). We should avoid using general names for the module like
generic
etc. Developers should strive to find the corresponding subcategory of the compute API, or create a new ones should it does not exist.Functional v.s. OOP design
cuspatial endorses both functional and OOP design of APIs.
GeoSeries
andGeoDataframe
is the core object the user will interact their geometric/geodesic data with. These APIs will strive to achieve function parity withgeopandas
. Beyond that, functional APIs supports a superset of functions that cuspatial supports, which may require to interact with custom objects within its own problem domain, such asCubicSpline
and quad tree indexing table.Internal v.s. external
Column level implementation and geoarrow specifications of
geoseries
is internal API.Breaking v.s. Non-breaking
Most refactor that goes on here should be
non-breaking
change as this only affects internal library design. If desired, we may enforce the same user view of cuspatial APIs as the library design in the future.The below are the proposed new layout: