thinkingmachines / geomancer

Automated feature engineering for geospatial data
MIT License
216 stars 16 forks source link

Update Geomancer backend #20

Closed ljvmiranda921 closed 5 years ago

ljvmiranda921 commented 5 years ago

This PR updates the backend (loading to a database, creating source and target URIs, etc.) for geomancer. This is an attempt to make geomancer warehouse-agnostic, and so that it's easy to just switch-out between data warehouses (SQLite for testing, BigQuery and others for prod)

Motivation

I want to easily switch-out between different Data Warehouses. There should be a common API to do that

Notable changes

  1. Change some module names, common is now backend
  2. A new base class Engine to create database connectors
  3. A new class BigQueryEngine(Engine) that interacts with BigQuery
  4. The cast() method simply calls the backend.connect() method to obtain the source, target, and engine SQLAlchemy primitives

Sample Usage

If I want to create a new engine (say, for SQLite):

from .base import Engine

class SQLiteEngine(Engine):
    def __init__(self, db_path):
        self.db_path = db_path

    def load(self, df):
        # Implement this method to load the pandas.DataFrame
        # inside db.sqlite . This is a required method (raises NotImplemented if not done)
        return table_uri

    def _my_helper_function(self):
        pass

You can check the implementation for the BigQueryEngine

Note I've tested this with our basic working example of loading CSVs etc.