ydataai / ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://docs.profiling.ydata.ai
MIT License
12.54k stars 1.69k forks source link

Feature Request: geospatial column analysis #1473

Open proselotis opened 1 year ago

proselotis commented 1 year ago

Missing functionality

As a newer user to ydata-profiling, I am leveraging the package with various types of data. One data type that I am using is geospatial data, I am currently using separate columns for Latitude and Longitude as float values. However the output of a bar chart and correlation may be useful in some cases, it may be more useful to provide the output in a form of a map so the coverage of different areas can data can be seen by the user. Unless I missed something reading through the documentation, there isn't any current functionality for this.

I noticed in the contributing guidelines this was highlighted as a potential EDA: extending data type support (GPS coordinates).

Proposed feature

Plotting of points on a map within the variable exploration so you could consider the coverage areas of the data.

I am happy to help contribute on a topic like this, however there are a few different paths this could appear in so I wanted to know if there was a preference from the ydata team.

Data size

Type of data

Alternatives considered

Additional context

No response

fabclmnt commented 1 year ago

Hi @proselotis,

thank for you feature request! And enthusiasm - contributions are always welcomed!

I would be happy to have you contribution for the feature, nevertheless there are a few points that need further definition.

Let me get back to you with some more detailed requirements for this feature so we can iterate together!

mrit64 commented 1 year ago

Hi,

An alternative could be to extend ydata-profiling to support GeoPandas. GeoPandas uses Shapely internally and can handle Points, Multi-Points, Lines, Multi-Lines, Polygons, Multi-Polygons. Profiling one dataset could give information about the type of geometry, coordinate system, etc… And eventually plot the shapes on a map. Comparing two geographic datasets may be more challenging to represent the differences between the geometries.