tortoise / tortoise-orm

Familiar asyncio ORM for python, built with relations in mind
https://tortoise.github.io
Apache License 2.0
4.65k stars 387 forks source link

some tortoise usage issues/questions #214

Closed AEnterprise closed 5 years ago

AEnterprise commented 5 years ago

below are a few things related to trying to use tortoise, not sure if i should split this up into even more issues but i'm not sure all of them are even bugs

some background: i'm currently using peewee but that is syncronous while my use case (gearbot, a fast growing discord chatbot) is an asyncronous environment and as one table in particular is growing to sizes of over 7GB and new features being considered that might require full tablescans async is def required to not block everything

database used atm is mariadb (people recomended postgress but not with any good reasons, it seems tortoise has better support for that but unless there is a good performance benifit i don't see the point in migrating a database this big)

1) tortoise seems to really not like you naming your pk anything other then id, ranging from fields not being found (when trying to name it id in code but make it use a different name in the to issues with it thinking there are multiple PKs 2) while i have not tried in the latest version fk relations where not being made, i had to correct that manually in the database, tortoise works perfectly fine without them but the database performance suffers as a result of the missing indexes 3) i can't find any info on the pool size and according to the docs i can't set this myself either. this is especially noticable in high volume writes. while this isn't really gona happen often in my application, when running a bunch of write operations in paralel on a the same table, it was only a little faster then running them one by one. part of that might be the write locks, but as size grows it might be nice to have more then 1 connection to the database to speed things up. the server also only reports 1 connection being opened 4) bigint fields are created as regular int fields in the database

all in all i do find tortoise easier to work with when querying and especially since it's a sync and won't be holding up the loop, just has a few weird quirks

sorry for the wall of text

grigi commented 5 years ago

some background: i'm currently using peewee but that is syncronous while my use case (gearbot, a fast growing discord chatbot) is an asyncronous environment and as one table in particular is growing to sizes of over 7GB and new features being considered that might require full tablescans async is def required to not block everything

This sounds very interesting. I'm sure it is going to expose many rough edges on Tortoise as I don't know of a similarly complex system that was using it yet.

database used atm is mariadb (people recomended postgress but not with any good reasons, it seems tortoise has better support for that but unless there is a good performance benifit i don't see the point in migrating a database this big)

There are definitely things that is in PostgreSQL's favour, but if what you have isn't broken, don't fix it? Many of MySQL's odd behaviours are somewhat managed by ORMs.

  1. tortoise seems to really not like you naming your pk anything other then id, ranging from fields not being found (when trying to name it id in code but make it use a different name in the to issues with it thinking there are multiple PKs

If this is really the case it should be a bug. Please give me details of all instances you come across. I'm expecting it to work perfectly fine with the current stable version. I have several tests testing that source'd models & fields work as I expect it should: https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L412-L437 https://github.com/tortoise/tortoise-orm/blob/develop/tests/test_source_field.py etc...

  1. while i have not tried in the latest version fk relations where not being made, i had to correct that manually in the database, tortoise works perfectly fine without them but the database performance suffers as a result of the missing indexes

Could you give me what you expect the perfect DDL should look like? I know that there was several MySQL-related DDL bugs that got fixed in the last month. Could you report the version you are using? Also there is at least one known related perf issue: https://github.com/tortoise/tortoise-orm/blob/develop/tortoise/fields.py#L647 that should be fixed with fields having the ability to define their own indexes (plural, as in a single field may need different indexed and/or constraints for different use cases)

  1. i can't find any info on the pool size and according to the docs i can't set this myself either. this is especially noticable in high volume writes. while this isn't really gona happen often in my application, when running a bunch of write operations in paralel on a the same table, it was only a little faster then running them one by one. part of that might be the write locks, but as size grows it might be nice to have more then 1 connection to the database to speed things up. the server also only reports 1 connection being opened

Er, yes. This is a feature that I have been deferring for a while, as my attempts at getting this to work perfectly opened a can of worms, that at the time I felt was not a priority. Currently it will use a single connection and shove as much through there as possible. If you want to help out here, you are welcome to :-) It was one of the reasons we decided to drop Py3.5 support as it was limiting us in using some asyncio features fully.

  1. bigint fields are created as regular int fields in the database

I'm sure I fixed this bug recently, like in 0.13.9?: https://tortoise-orm.readthedocs.io/en/latest/CHANGELOG.html#id5 If you still have it after updating to latest stable version, then we should fix this. I remember this being a MySQL specific bug?

all in all i do find tortoise easier to work with when querying and especially since it's a sync and won't be holding up the loop, just has a few weird quirks

Yay! But lets fix these quirks for you

sorry for the wall of text

No worries, I'm often a culprit of this myself.

AEnterprise commented 5 years ago

This sounds very interesting. I'm sure it is going to expose many rough edges on Tortoise as I don't know of a similarly complex system that was using it yet.

it is though the biggest problem atm is just that all database interacting code needs to be replaced, retested and i'm cleaning up some legacy messes along the way (like the database using int instead of timestamps for storing times)

If this is really the case it should be a bug. Please give me details of all instances you come across. I'm expecting it to work perfectly fine with the current stable version. I have several tests testing that source'd models & fields work as I expect it should:

https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L413 is what i'm talking about, the field is named "id" in code. while it's probably best practice to do so, i had a few fields called differently in my models (messageid for example, in another table i was trying to just call it "token" as i was storing dashboard access tokens and that name made more sense). trying to name that anything else then "id" made things freak out

Er, yes. This is a feature that I have been deferring for a while, as my attempts at getting this to work perfectly opened a can of worms, that at the time I felt was not a priority. Currently it will use a single connection and shove as much through there as possible. If you want to help out here, you are welcome to :-) It was one of the reasons we decided to drop Py3.5 support as it was limiting us in using some asyncio features fully.

i might take a look at some point, for now i think i'll focus on finishing the orm swap and cleaning some of the legacy code involved (as some well as making things a bit more mobile friendly (infractions will be the real test of what tortoise can handle) Though once i get a better grip on tortoise i'd def be up for looking into if i can help make that possible

Yay! But lets fix these quirks for you thanks a lot

i'll see to get all the quirks retested after class and see if i can assemble a basic case for testing and demo'ing some of the things

grigi commented 5 years ago

it is though the biggest problem atm is just that all database interacting code needs to be replaced, retested and i'm cleaning up some legacy messes along the way (like the database using int instead of timestamps for storing times)

Hence your need for a migration system.

https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L413 is what i'm talking about, the field is named "id" in code. while it's probably best practice to do so, i had a few fields called differently in my models (messageid for example, in another table i was trying to just call it "token" as i was storing dashboard access tokens and that name made more sense). trying to name that anything else then "id" made things freak out

If "id" is a problem, that is a bug. I'll look into it.

i'll see to get all the quirks retested after class and see if i can assemble a basic case for testing and demo'ing some of the things

That would be super :+1:

AEnterprise commented 5 years ago

alright looking at things again it seems most issues are either fixed in newer releases or logged already on your field refactor issue. a few things of note though:

grigi commented 5 years ago
  • would it be possible to add a note to the docs warning people that if you use an intfield primary key, that you need to also pass the "autogenerated" param to false if you want to be able to set it yourself? if you don't it just silently ignores whatever you pass for that field (or maybe log a warning?)

Of course. you mean generated=False?

  • foreign keys are added now and the warnings related to it are gone

Yes, I'm sure we changed that recently :slightly_smiling_face:

  • passing the echo param in the connection string is having some unintended sideffects and doesn't seem to work. i would have only expected this to cause it to start logging all the queries it does but i don't see any logging showing up from tortoise, it however is causing debug logs to show for other libs, tortoise is setting that one to debug level instead of it's own logger

The echo param is passed through to the aiomysql driver, which passes it to pymysql. Tortoise itself doesn't currently handle that directly, Maybe it shoud? If you allow the namespace of "db_client" to emit debug (via a complex log config) you should see what is emitted via Tortoise itself.

  • is there a way to tell tortoise only to generate a specific table? or will it try to generate all of them? i only see one warning logged about a table already existing

Not right now. There is a lot of scope between our current naïve implementation and full migrations. I'm open for any feature request of better schema generation.

AEnterprise commented 5 years ago

Of course. you mean generated=False?

exactly what i ment sorry, still getting used to the names, i'll open a PR for this

Yes, I'm sure we changed that recently 🙂

in a very good way, thanks

The echo param is passed through to the aiomysql driver, which passes it to pymysql. Tortoise itself doesn't currently handle that directly, Maybe it shoud? If you allow the namespace of "db_client" to emit debug (via a complex log config) you should see what is emitted via Tortoise itself.

well my main issue atm is that however that is doing it, it's messing with other loggers causing giant debug spam

Not right now. There is a lot of scope between our current naïve implementation and full migrations. I'm open for any feature request of better schema generation.

being able to tell tortoise exactly what tables to generate would be nice, maybe it trying all of them instead of quitting after the first one fails would be nice as well. this could then act to make sure all tables exist. i won't do migrations for you, but for something as "simple" as just a single new table for a new feature wouldn't require specific migration anymore

grigi commented 5 years ago

I did a lot of testing, and the concern re habing to use "id" for the PK field name if the field has a custom source_field or not isn't happening for me. It may be something that got fixed in the later releases?

AEnterprise commented 5 years ago

tried and can no longer reproduce it, must have been fixed as a side effect of something else i guess