opengeos / WhiteboxTools-ArcGIS

ArcGIS Python Toolbox for WhiteboxTools
https://www.whiteboxgeo.com/manual/wbt_book/intro.html
MIT License
269 stars 66 forks source link

Idea to improve robustness of all tools that take vector input #13

Closed Hornbydd closed 4 years ago

Hornbydd commented 4 years ago

So I was about to suggest there were errors in the following tools:

But lucky for me I tested my input data. Turned out the problem was that my input data was a file geodatabase featureclass. Once I exported the layer to shapefile format the tools ran without error.

Now I went and had a look at the WBT manual and it does indeed say that it only accepts shapefiles but I think the tool needs to trap this as the error message returned gives no indication as to what the problem was to the user. This can be easily rectified by inserting the following code into the updateMessges() section of each tool that takes a vector layer as input. Here is the code

    def updateMessages(self, parameters):
        # Assumes that parameter 0 is always the vector input
        if parameters[0].altered:
            layer = parameters[0].value
            desc = arcpy.Describe(layer)
            if desc.FeatureClass.dataType != "ShapeFile":
                parameters[0].setErrorMessage("This tool expects the layer to be a Shapefile")
            else:
                parameters[0].clearMessage()
        return

Currently the tools are doing little or no input validation and this simple addition will make the user experience certainly less frustrating.

giswqs commented 4 years ago

@Hornbydd Ideas to make the toolbox more robust are always welcome! Thank you for your suggestion. I have added Geodabase checking to every tool. The toolbox will now throw an execution error when input datasets are from a Geodatabase.

desc = arcpy.Describe(i)
i = desc.catalogPath
if os.path.dirname(i).endswith(".gdb"):
    arcpy.AddError("Datasets stored in a Geodatabase are not supported.")
Hornbydd commented 4 years ago

@giswqs Thanks for your rapid improvement but the logic is flawed. Yes the code identifies and rightly warns the user but the tool executes and still crashes returning the weird WBT error message.

If you want the error trapping code to be in the execute() function then it should be:

 if i is not None:
            desc = arcpy.Describe(i)
            i = desc.catalogPath
            if os.path.dirname(i).endswith(".gdb"):
                 arcpy.AddError("Datasets stored in a Geodatabase are not supported.")
            else:
                  # rest of your code for collecting parameters and execute WBT tool

But I believe the validation should really be occurring in updateMessges() as in my original issue. The code there means the tool behaves like normal geoprocessing tools and won't even run until the user supplies a valid input. Your current set up allows invalid inputs, warns the user but then carries on and crashes!

Try putting the code I suggested or a modified version using your approach in updateMessges() and you will see how the tools behaves.

giswqs commented 4 years ago

@Hornbydd I have added the following function to every tool, which can now notify users not to use datasets stored in a Geodatabase before executing the tool. One caveat is that this adds additional 4000 lines of code to the toolbox (34,144 lines in total), which can take a while when the toolbox is being loaded.

    def updateMessages(self, parameters):
        for param in parameters:
            param_str = param.valueAsText
            if param_str is not None:
                try:
                    desc = arcpy.Describe(param_str)
                    if os.path.dirname(desc.catalogPath).endswith(".gdb"):
                        param.setErrorMessage("Datasets stored in a Geodatabase are not supported.")
                except:
                    param.clearMessage()
        return
Hornbydd commented 4 years ago

@giswqs As a comparison it took 12 seconds for the toolbox to appear in ArcToolbox for me. I think that 12 seconds is well worth the price as the user experience is much slicker now. Your tools are stopping me the user from trying to input an invalid dataset to WBT. Although subtle it it now behaving like a normal geoprocessing tool and this makes the user experience much better and it stops dead in the tracks any silly mistake by the user and importantly stops WBT from receiving invalid inputs, which could cause it to crash.

So nice one! Enjoy a virtual pint of beer on me!

But I'm stacking up a load of "issues" probably more to to do WBT....

Hornbydd commented 4 years ago

@giswqs OK I spoke too soon!

I have found two bugs in your approach for testing if the input dataset is in a file geodatabase or not.

  1. Your approach tests if the last part of the file path ends with ".gdb", what if the FeatureClass is in a personal geodatabase? That ends with ".mdb"

  2. Your approach only works for FeatureClasses at the top level in a file/personal geodatabase. There are many reasons why a FeatureClass would sit in a FeatureDataset, topology, network, geometric network or just simply for organising. Your approach fails to identify FeatureClasses as GeoDatabase FeatureClasses if they sit within a FeatureDataset.

If you use the code I suggested in the original raised issue, none of this becomes an issue as you are testing dataset type, not it's location. I would recommend you use the code I suggested as it solves two bugs in one.

giswqs commented 4 years ago

@Hornbydd See the updated code below. I tried to use the same block of code for each tool rather than having customized code for each tool. It should now handle gdb, mdb,FeatureDataset, etc.

    def updateMessages(self, parameters):
        for param in parameters:
            param_str = param.valueAsText
            if param_str is not None:
                try:
                    desc = arcpy.Describe(param_str)
                    if (".gdb" in desc.catalogPath) or (".mdb" in desc.catalogPath):
                        param.setErrorMessage("Datasets stored in a Geodatabase are not supported.")
                except:
                    param.clearMessage()
        return
Hornbydd commented 4 years ago

@giswqs OK that 's working good and I'm happy with that but being the pessimistic I bet someone will eventually try and put in something like c:\mydata\exportfrom.gdb.shp ... Just something to be aware of that this validation is not truly bomb proof, but much better and will trap 95% of the problem input dataset types. If you are seeking perfection I recommend going down the route of testing dataset type rather than in-string path searches.

giswqs commented 4 years ago

@Hornbydd I have improved the code slightly, and now it can identify ".gdb\" and ".mdb\" in the folder name rather than the file basename. Theoretically, uses can have a shapefile name with '.gdb' in it. However, it seems all tools in the system ArcToolbox do not allow any input shapefile with '.gdb' in its name. See the screenshots below. Interestingly, I also discovered an arcpy bug when testing this. If you add a shapefile (e.g., test.gdb.shp) to ArcMap, desc.catalogPath will return test.shp, which automatically removes .gdb from its name. The first screenshot below shows that the same dataset with different results returned from desc.catalogPath and desc.basename.

if (".gdb\\" in desc.catalogPath) or (".mdb\\" in desc.catalogPath):
    param.setErrorMessage("Datasets stored in a Geodatabase are not supported.")

WhiteboxTools Vector Polygons to Raster

ArcGIS Polygon to Raster

Hornbydd commented 4 years ago

@giswqs In attempt to understand the bug in #16 I was looking at the python toolbox script. I could not help but notice that the code we have been discussing above that tests if the input is in a geodatabase is in all execute() and updateMessgaes() for all tools. This code needs only be in the updateMessage() methods as having it there traps erroneous inputs before the tool can even execute.

There is no need for it to be in the execute() method, I would recommend you remove all those from the execute(), this will have the benefit of making the code smaller and hopefully better load times?

So all you need to do is remove this code from all execute() methods (but not the updateMessages():

XYZ is the name of your first parameter variable, this seems to change with each tool

        if XYZ is not None:
            desc = arcpy.Describe(input1)
            input1 = desc.catalogPath
            if (".gdb\\" in desc.catalogPath) or (".mdb\\" in desc.catalogPath):
                 arcpy.AddError("Datasets stored in a Geodatabase are not supported.")
giswqs commented 4 years ago

@Hornbydd I have removed the geodatabase check in execute() as suggested. Thanks.