o2r-project / geoextent

Python library for extrating geospatial extent of files and directories with multiple data formats
https://o2r.info/geoextent/
MIT License
1 stars 4 forks source link

Improve handler and file format detection #28

Closed nuest closed 3 years ago

nuest commented 4 years ago

As of now, there is an "if else" chain checking and selecting suitable handlers. The concept of a handler is great, but instead of a fixed chain, we should have a list of handlers, and each handler can decide if they "take" the format because they support it, or if they don't.

See a working example of such a mechanism here:

  1. import all handlers (in this case they are called contentproviders): https://github.com/jupyter/repo2docker/blob/master/repo2docker/app.py#L41
  2. put them all in a list: https://github.com/jupyter/repo2docker/blob/master/repo2docker/app.py#L144
  3. iterate through the list until one handler returns True in the detect() function.

For this to work, we need a class FormatHandler, sketched here (roughly based on ContentProvider):

class FormatHandler:
    def __init__(self):
        self.log = logging.getLogger("geoextent")

    def detect(self, file):
        """Determine compatibility between source and this hanlder.
        If the handler knows how to extract geospatial and temporal extent,
        it will return `True`.
        If the provider does not know how to fetch this source it will return
        `False`.
        """
        raise NotImplementedError()

    def bbox(self, file):
        """Get the geospatial extent of given file.
        """
        raise NotImplementedError()

    def time(self, file):
        """Get the temporal extent of given file.
        """
        raise NotImplementedError()

Based on this abstract class, we can implement the respective handlers (pseudo code!):

class GeojsonFormatHandler(FormatHandler):
    def detect(self, file):
        if os.path.splitext(filePath) == "json" or == "geojson":
            check if the content is GeoJSON, maybe by trying to parse it as such? or is there another clever way to make the detection based on the _content_ ?

    [... implement functions ...]
YouQam commented 4 years ago

I used dict where its key is the file format and value is the module.

YouQam commented 4 years ago

I used another way to do it, but the issue isremaining open.

I used a dictionary that holds file format as a key and module as a value. After file format is extracted is goes through a loop of all keys to get the suitable module to handle the file.

modulesSupported = {'geojson':handleGeojson, 'json':handleGeojson,'csv':handleCSV,
    'shp':handleShapefile, 'dbf':handleShapefile, 'geotiff':handleGeotiff, 'tif':handleGeotiff}

    # get the module that will be called (depending on the format of the file)
    for key in modulesSupported.keys():
        if key == fileFormat:
            print(key)
            usedModule = modulesSupported.get(key)
nuest commented 3 years ago

Get rid of the modulesSupported object in https://github.com/o2r-project/geoextent/blob/master/geoextent/lib/extent.py#L12