thinkingmachines / geomancer

Automated feature engineering for geospatial data
MIT License
214 stars 16 forks source link

How to optimize load time for cast? #75

Open tgvgamboa opened 5 years ago

tgvgamboa commented 5 years ago

Issue Description

getting features for columns takes around 25 seconds per column for a data frame of 56,761 rows

Steps to reproduce the issue

df = pd.read_csv('sample.csv')`
pois_book_instance = SpellBook(
    spells = [
        DistanceToNearest(
            'police',
            source_table = 'project.dataset_id.gis_osm_pois_free_1',
            dburl = 'bigquery://project',
            feature_name = 'pois_dist_police'),
        NumberOf(
            'police',
            within = 1000,
            source_table = 'project.dataset_id.gis_osm_pois_free_1',
            dburl = 'bigquery://project',
            feature_name = 'pois_num_1000_police')
])
pois_book = pois_book_instance.cast(df)

What's the expected result?