spatial-data-lab / knime-geospatial-extension

This repository is built for KNIME-CGA Geospatial Project, and the goal is to build Python-based nodes for geospatial analysis in KNIME Analytic Platform.
MIT License
25 stars 10 forks source link

US2020 Census Data node should have option to output US2020 TIGER Map compatible GEO_ID column #87

Closed koettert closed 1 year ago

koettert commented 1 year ago

As a user I need to use a string manipulation node after the US2020 Census Data node to convert the GEO_ID column values to be compatible to the GEO_ID values from the US2020 TIGER Map node. The performed operation is regexReplace($GEO_ID$,".*US" ,"") Ideally the US2020 Census Data node provides an option to either replace the existing GEO_ID format with the compatible one or be appending a new GEO_ID_TIGER column. Which version to use depends on if the original Census GEO_ID is also used elsewhere and of interest to join with other resources or not. Do we know what 1400000US means? Does it contain any useful information?

UrbanGISer commented 1 year ago

As they have the same format , I add a code to both nodes US2020 Census Data and US ACS 5 years if "GEO_ID" in gdf.columns: gdf["GEO_ID"] = gdf["GEO_ID"].str.replace(r".*US", "", regex=True)

koettert commented 1 year ago

@koettert check and close

koettert commented 1 year ago

@UrbanGISer we need to be very cautious to make sure that nodes are backward compatible between versions. Otherwise we will break existing workflows which we should prevent at all costs!

We could either deprecate the node which I think is to drastic but instead I would suggest to add a new dialog option to the Census Data node that allows users to decide if they want to strip the US part from the GEO ID. Maybe we can call it "Make GEO.ID Tiger/Line compatible". The process of stripping the first nine characters to make it Tiger compatible is also explained here at the bottom of the page so we could also use a faster substring method instead of a regex to create a compatible GEO.ID.

For new nodes the setting is true by default but for existing nodes the setting is false. Please have a look at the Python docu to see how this can be done.

koettert commented 1 year ago

Fixed with https://github.com/spatial-data-lab/knime-geospatial-extension/pull/117