jj0hns0n / geonode

GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
http://geonode.org/
GNU General Public License v3.0
3 stars 0 forks source link

Figure out how to deal with type mismatch #40

Closed jj0hns0n closed 9 years ago

jj0hns0n commented 9 years ago

In this example, the tract field is an int in the table and a varchar in the layer.

https://dpaste.de/CbXC

https://dpaste.de/zy5Q#L9,35

raprasad commented 9 years ago

Possible Strategy: Require Data Formatting Before Table Upload

Thoughts about formatting before table upload

  1. API User (Dataverse system) is cleaning/formatting data
  2. The Dataverse system needs to know how to clean/format the data for each JoinTarget.
  3. GeoNode JoinTarget needs to provide some type of formatting information.
    1. How do you do this? (not sure of best path)

Example (brainstorming)

  1. Change JoinTarget objects to have a single Attribute (instead of a M2M relation)
    • change: attributes = models.ManyToManyField(Attribute)
    • to: attribute = models.ForeignKey(Attribute)
  2. Require JoinTarget objects to specify a JoinTargetFormatType.
    • This might be an FK to an object that specifies the Format.
    • Curator of the JoinTargets creates the JoinTargetFormatTypes--regulated by GeoNode admin
      • e.g. This is an expert user
    • For the API user (Dataverse system), the JoinTargetFormatType can offer information on what is expected
    • Very rough idea:
      • Note: regex_replacement_string, python_code_snippet, and tranformation_function_name are just brainstorming ideas
class JoinTargetFormatType(models.Model)

    name = models.CharField(max_length=255, help_text='Census Tract (6 digits, no decimal)') 
    description_shorthand = models.CharField(max_length=255, help_text='dddddd') 
    clean_steps = models.TextField(help_text='verbal description. e.g. Remove non integers. Check for empty string. Pad with zeros until 6 digits.')

    # ??regex_replacement_string - could get ugly
    regex_replacement_string = models.CharField(help_text='"[^0-9]"; Usage: re.sub("[^0-9]", "", "1234.99"'\
                                , max_length=255)

    # ??python_code_snippet - dangerous - not to run directly
    #   
    #   val_list = [re.sub("[^0-9]", "", `x`) for x in val_list]
    #   val_list = [x for x in val_list if len(x) > 0]
    #   val_list = [x.zfill(6) for x in val_list if len(x) > 0]
    #
    python_code_snippet = models.TextField(blank=True)

    # ??tranformation_function_name - do cleaning on GeoNode side with a set of predefined transformation functions
    #
    #   Viable - e.g. Use top geospatial identifiers and formats
    #
    tranformation_function_name = models.CharField(max_length=255, blank=True, choices=TRANSFORMATION_FUNCTIONS)
jj0hns0n commented 9 years ago

This should work just fine. Not sure what you would do with the regex on the other side, but it should work. Lets discuss.

jj0hns0n commented 9 years ago

Ok, I've implemented the basic model for this (without actually hooking anything up so that it does something).

The jointarget endpoint now returns this.

[
    {
        "category": "census-tract",
        "attribute": {
            "attribute": "TRACTCE",
            "type": "xsd:string"
        },
        "layer": "geonode:tl_2014_25_tract",
        "type": {
            "clean_steps": "Cast from Integer to string\r\nalter table <tablename> alter column tract type character varying(6);\r\nupdate <tablename> set tract = lpad(\"tract\", 6, '000000');",
            "name": "Census Tract (6 digits, no decimal, left pad with 0)",
            "description": "dddddd"
        },
        "id": 2
    }
]
jj0hns0n commented 9 years ago

This endpoint is now returning a meaningful error now as well on type mismatch.

Error Creating Join: Error Joining table boston_income_73g_5fihmpx to layer geonode:tl_2014_25_tract: operator does not exist: character varying = integer\nLINE 1: ..._income_73g_5fihmpx on tl_2014_25_tract.\"TRACTCE\" = boston_i...\n ^\nHINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.\n

jj0hns0n commented 9 years ago

Note that for testing purposes, Im doing this manually on the server side by doing an explicit type cast from integer to string and then left padding with 0s.

jj0hns0n commented 9 years ago

Closing as dupe of #49