palewire / django-postgres-copy

Quickly import and export delimited data with Django support for PostgreSQL's COPY command
https://palewi.re/docs/django-postgres-copy/
MIT License
180 stars 48 forks source link

Importing geometry fields for GeoDjango #203

Open dheerajchand opened 1 week ago

dheerajchand commented 1 week ago

I've tried to capture a stacktrace, but the error seems to take down my terminal. I've exported the contents of a shapefile to CSV using GeoPandas:

In [1]: from django.conf import settings

DEBUG 2024-11-10 20:49:29,011 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [2]: tz_source_path = pathlib.Path(settings.VECTOR_SPATIAL_DATA_SUBDIRECTORY / 'timezones-with-oceans-now.shapefile' / 'combined-s
   ...: hapefile-with-oceans-now.shp')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 tz_source_path = pathlib.Path(settings.VECTOR_SPATIAL_DATA_SUBDIRECTORY / 'timezones-with-oceans-now.shapefile' / 'combined-shapefile-with-oceans-now.shp')

NameError: name 'pathlib' is not defined

DEBUG 2024-11-10 20:51:30,484 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [3]: import pathlib

DEBUG 2024-11-10 20:51:33,821 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [4]: tz_source_path = pathlib.Path(settings.VECTOR_SPATIAL_DATA_SUBDIRECTORY / 'timezones-with-oceans-now.shapefile' / 'combined-s
   ...: hapefile-with-oceans-now.shp')

DEBUG 2024-11-10 20:51:35,489 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [5]: import geopandas as gpd

DEBUG 2024-11-10 20:51:46,153 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [6]: tz_gdf = gpd.read_file(tz_source_path)

DEBUG 2024-11-10 20:52:40,508 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [7]: tz_gdf
Out[7]:
                   tzid                                           geometry
0               Etc/UTC  POLYGON ((-17 -86, -17 -71, -5.5 -69, -5.5 -86...
1        Africa/Abidjan  MULTIPOLYGON (((-3.14829 4.89592, -3.25896 4.9...
2         Europe/Moscow  MULTIPOLYGON (((42.08069 43.20469, 42.08446 43...
3          Africa/Lagos  MULTIPOLYGON (((8.388 4.36726, 8.37898 4.36168...
4   Africa/Johannesburg  MULTIPOLYGON (((37.5 -47.15021, 37.49085 -47.1...
..                  ...                                                ...
89            Etc/GMT+8  MULTIPOLYGON (((-127.5 90, -112.5 90, -112.5 7...
90            Etc/GMT+9  MULTIPOLYGON (((-142.5 90, -127.5 90, -127.5 7...
91           Etc/GMT+10  MULTIPOLYGON (((-157.5 90, -142.5 90, -142.5 7...
92           Etc/GMT+11  MULTIPOLYGON (((-172.5 52.59049, -172.5 60.043...
93           Etc/GMT+12  MULTIPOLYGON (((-180 52.13849, -180 62.55456, ...

[94 rows x 2 columns]

DEBUG 2024-11-10 20:52:51,496 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [8]: tz_outpath = settings.TABULAR_DATA_SUBDIRECTORY / 'tz.csv'

DEBUG 2024-11-10 20:53:19,010 selector_events __init__ selector_events.py 54 Using selector: EpollSelector
In [9]: tz_gdf.to_csv(tz_outpath)

And I try to import on this model:

from __future__ import unicode_literals
from django.contrib.gis.db import models
from postgres_copy import CopyManager

class Timezone(models.Model):
    tzid = models.CharField(max_length=80)

    # GeoDjango geometry
    geom = models.MultiPolygonField(srid=4326)

    # CopyManager

    objects = CopyManager()

    def __str__(self):
        representative_string = f"TZID: {self.tzid}"
        return representative_string

# Auto-generated `LayerMapping` dictionary for  Timezone model
timezone_mapping = {
    "tzid": "tzid",
    "geom": "MULTIPOLYGON",
}

But I get an error that crashes the terminal. Is this plugin able to work with geometry columns for GeoDjango? Do I need to do something different? The CSV is too big to attach, but you can see the file I am converting from here:

https://github.com/evansiroky/timezone-boundary-builder/releases/download/2024a/timezones-with-oceans-now.shapefile.zip

Thanks!

dheerajchand commented 1 week ago

markdown went a little crazy there.

dheerajchand commented 1 week ago

Here is the management command, quoted, and linked.

I have moved it out of the management directory for now because it is not working and I don't want it in with things that are working, but on my local filesystem I have kept it in the proper location.

from django.core.management.base import BaseCommand
from locations.models import *
from utilities import *
from django.conf import settings

# logging

import logging

logger = logging.getLogger("django")

class Command(BaseCommand):

    def handle(self, *args, **kwargs):
        # Since the CSV headers match the model fields,
        # you only need to provide the file's path (or a Python file object)

        timezones_csv = settings.TABULAR_DATA_SUBDIRECTORY / "tz.csv"

        column_to_field_mapping = {"tzid": "tzid", "geom": "geometry"}

        Timezone.objects.all().delete()
        insert_count = Timezone.objects.from_csv(
            timezones_csv,
            mapping=column_to_field_mapping,
        )

        message = ""
        message += f"{insert_count} records inserted"
        logger.info(message)