piccolo-orm / piccolo

A fast, user friendly ORM and query builder which supports asyncio.
https://piccolo-orm.com/
MIT License
1.44k stars 91 forks source link

Implementing custom column types #55

Closed Fingel closed 3 years ago

Fingel commented 3 years ago

Hello,

I just recently stumbled upon Piccolo and I'm very impressed so far with how much thought and effort has been put into it. Modern Python desperately needs an ORM that can keep up with the next generation of web development libraries. Piccolo looks to be well on it's way to fill that need.

I have a project in mind that I could use Piccolo with, but it does have a use case that requires the use of a PostGIS geography type column. I've been looking through https://github.com/piccolo-orm/piccolo/blob/master/piccolo/columns/base.py and it doesn't look too daunting to subclass Column and go from there. If I can get something useable, even with using raw for selects, it would be pretty great.

Since there doesn't seem to be any documentation on implementing custom columns, I'd figure I'd post an issue, even if there isn't really a problem yet just in case anyone had any advice or thoughts.

dantownsend commented 3 years ago

@Fingel Thanks for the kind words.

You're right - it should be possible to create custom column types without too much hassle by subclassing Column. I should mention this in the docs.

I don't know much about the Geography column type, but you'll probably need something like this:

class Geography(Column):

    value_type = str

    def __init__(
        self,
        shape: str = 'POINT',
        number: t.Optional[int] = None,
        default: t.Union[str, t.Callable[[], str], None] = "",
        **kwargs,
    ) -> None:
        self._validate_default(default, (str, None))

        self.shape = shape
        self.number = number
        kwargs.update({"shape": shape, "number": number})
        super().__init__(**kwargs)

    @property
    def column_type(self):
        if self.number:
            return f"GEOGRAPHY({self.shape}, {self.number})"
        else:
            return f"GEOGRAPHY({self.shape})"

I'm interested to know how you get along. If you're able to implement PostGIS functionality, I'd like to merge it into the main piccolo repo, or have a separate piccolo_gis repo.

Fingel commented 3 years ago

Hi @dantownsend ,

I did start our writing something that looks like your example. I wanted to make it easy on myself to I defined a Point(Column) which looked a lot like your example, with plans to generalize into full Geography type later.

The first issue I ran into when trying to create a migration was this:

module 'piccolo.columns.column_types' has no attribute 'Point'

https://github.com/piccolo-orm/piccolo/blob/fcf58ac329114ceb269732b6a8d888f79ea137d4/piccolo/apps/migrations/auto/migration_manager.py#L192

It looks like the migration manager assumes all classes are defined within piccolo.columns.column_types. I could fork the project and work in there, with the assumption that maybe support for external columns could be added later?

The next hurdle will be querying. PostGIS uses special functions like so:

SELECT * FROM source WHERE ST_Dwithin(source.location, 'SRID=4035;POINT(1 1)', 100)

for example.

Here is where it looks a little more difficult. I'm still going through the code, but I'm not quite sure yet how a custom method like this could be implemented. So that you could so something like this:

MyTable.objects().where(MyTable.location.ST_Dwithin('SRID=4035;POINT(1 1)', 100)).run_sync()

Any ideas on how to approach that?

Alternatively (which could go a long way to help other niche use cases) would be to support raw sql inside of a Where. So you could do something like this:

MyTable.objects().where(
    (MyTable.name=='foo') &
    (raw("ST_Dwithin(location, 'SRID=4035;POINT(1 1)', 100)"))
).run_sync()

But I have no idea of the feasibility of that.

dantownsend commented 3 years ago

@Fingel I think those issues are all solvable, but will require some changes to Piccolo.

  1. Custom column types could be registered in piccolo_conf.py, which solves the migration issue.
  2. In terms of adding custom methods like ST_Dwithin, the best analogy so far is the JSONB column type, which has an arrow method. Something similar to this could be implemented.
  3. With raw SQL in where clauses - this also shouldn't be too tricky, and I can add this.

I should be able to get 1 and 3 done in the next day or so. Otherwise you might want to fork it for now.

Fingel commented 3 years ago

Hi @dantownsend I was able to implement geometry/geography columns in my fork here: https://github.com/piccolo-orm/piccolo/compare/master...Fingel:feature/postgis_funcs?expand=1

Some questions:

  1. geography fields in particular really need to use gist indexes instead of btree. I noticed that gist is referenced here: https://github.com/piccolo-orm/piccolo/blob/c0d34ea90639feeccb5edf27e8d7ec80cffa951a/piccolo/query/methods/create_index.py#L16 but I do not believe anything other than btree is implemented. What do you think about adding the ability for columns to specify which kind of index they should use?
  2. Database extensions. In order for these columns to work, the PostGIS extension needs to be installed. If you want to test it out, I'd recommend spinning up an instance of the Postgis Docker image here. It's kept up to date with PostgreSQL, but comes with the extension installed. People not using this image will have to install postgis manually. One thing that is nice (which Geodjango does) is automatically run "CREATE EXTENSION 'postgis' IF NOT EXISTS;" so that any new database created will have the extension installed automatically. This is really helpful for test databases, for example. How about a way to specify custom sql to be run at certain hooks, or something like that?

Looking forward to continuing on this project!

dantownsend commented 3 years ago

@Fingel You've made some great progress.

With the index types, it would be easy enough to expose this in the Column constructor as an index_type arg. What complicates matters is making it work with migrations, if someone was to change the index_type. It's not impossible to do, but anything which touches migrations usually takes a bit longer to implement.

The PostgresEngine has a prep_database method, which currently just sets up the uuid extension.

https://github.com/piccolo-orm/piccolo/blob/c0d34ea90639feeccb5edf27e8d7ec80cffa951a/piccolo/engine/postgres.py#L270

Extending this to also setup other extensions would be pretty straightforward. PostgresEngine would just need to accept a list of extensions names.

I've added the ability to run raw SQL in where clauses: https://github.com/piccolo-orm/piccolo/pull/57

If this is what you were expecting, I'll release it tomorrow. Unfortunately Travis CI is painfully slow now, and I really need to switch to Github Actions.

Fingel commented 3 years ago

Hi,

I think the raw where PR is great, I'm glad it was merged.

I still think a PostGIS extension for Piccolo would be great. For example, It would allow us to make working with GIS fields a little more user friendly by being able to leverage tools like Shapely to covert the textual representation of geometries into useful python objects. This would require installing additional dependencies you probably wouldn't want in Piccolo core.

I'm not really sure how to proceed at this point. I could keep working in my branch, but it would be good to know if creating an extension would be possible, or if keeping it super simple so that the fields can be included in core. Thoughts?

dantownsend commented 3 years ago

@Fingel I've made a couple of updates, which should make using custom column types easier.

There are still some limitations though. PostGIS may support custom DDL statements, which the Piccolo migrations don't support. This could mean some more refactoring of Piccolo.

Merging the PostGIS stuff into core is still an option - shapely could be an optional dependency.

Can you think of any other blockers for integrating GIS with Piccolo, either as part of the main library, or a separate package?

Fingel commented 3 years ago

@dantownsend This is good stuff! I will try factoring the GIS column types out of my fork and test it with #69. If it goes well and I can develop the columns as a separate package, that would be great. Why don't we see how that goes and if the package looks good enough we can consider merging it back into core?

dantownsend commented 3 years ago

@Fingel Cool - makes sense.

dantownsend commented 3 years ago

Going to close this for now - custom column types should now be possible.