Closed AEnterprise closed 5 years ago
some background: i'm currently using peewee but that is syncronous while my use case (gearbot, a fast growing discord chatbot) is an asyncronous environment and as one table in particular is growing to sizes of over 7GB and new features being considered that might require full tablescans async is def required to not block everything
This sounds very interesting. I'm sure it is going to expose many rough edges on Tortoise as I don't know of a similarly complex system that was using it yet.
database used atm is mariadb (people recomended postgress but not with any good reasons, it seems tortoise has better support for that but unless there is a good performance benifit i don't see the point in migrating a database this big)
There are definitely things that is in PostgreSQL's favour, but if what you have isn't broken, don't fix it? Many of MySQL's odd behaviours are somewhat managed by ORMs.
- tortoise seems to really not like you naming your pk anything other then
id
, ranging from fields not being found (when trying to name it id in code but make it use a different name in the to issues with it thinking there are multiple PKs
If this is really the case it should be a bug. Please give me details of all instances you come across. I'm expecting it to work perfectly fine with the current stable version. I have several tests testing that source'd models & fields work as I expect it should: https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L412-L437 https://github.com/tortoise/tortoise-orm/blob/develop/tests/test_source_field.py etc...
- while i have not tried in the latest version fk relations where not being made, i had to correct that manually in the database, tortoise works perfectly fine without them but the database performance suffers as a result of the missing indexes
Could you give me what you expect the perfect DDL should look like? I know that there was several MySQL-related DDL bugs that got fixed in the last month. Could you report the version you are using? Also there is at least one known related perf issue: https://github.com/tortoise/tortoise-orm/blob/develop/tortoise/fields.py#L647 that should be fixed with fields having the ability to define their own indexes (plural, as in a single field may need different indexed and/or constraints for different use cases)
- i can't find any info on the pool size and according to the docs i can't set this myself either. this is especially noticable in high volume writes. while this isn't really gona happen often in my application, when running a bunch of write operations in paralel on a the same table, it was only a little faster then running them one by one. part of that might be the write locks, but as size grows it might be nice to have more then 1 connection to the database to speed things up. the server also only reports 1 connection being opened
Er, yes. This is a feature that I have been deferring for a while, as my attempts at getting this to work perfectly opened a can of worms, that at the time I felt was not a priority. Currently it will use a single connection and shove as much through there as possible. If you want to help out here, you are welcome to :-) It was one of the reasons we decided to drop Py3.5 support as it was limiting us in using some asyncio features fully.
- bigint fields are created as regular int fields in the database
I'm sure I fixed this bug recently, like in 0.13.9?: https://tortoise-orm.readthedocs.io/en/latest/CHANGELOG.html#id5 If you still have it after updating to latest stable version, then we should fix this. I remember this being a MySQL specific bug?
all in all i do find tortoise easier to work with when querying and especially since it's a sync and won't be holding up the loop, just has a few weird quirks
Yay! But lets fix these quirks for you
sorry for the wall of text
No worries, I'm often a culprit of this myself.
This sounds very interesting. I'm sure it is going to expose many rough edges on Tortoise as I don't know of a similarly complex system that was using it yet.
it is though the biggest problem atm is just that all database interacting code needs to be replaced, retested and i'm cleaning up some legacy messes along the way (like the database using int instead of timestamps for storing times)
If this is really the case it should be a bug. Please give me details of all instances you come across. I'm expecting it to work perfectly fine with the current stable version. I have several tests testing that source'd models & fields work as I expect it should:
https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L413 is what i'm talking about, the field is named "id" in code. while it's probably best practice to do so, i had a few fields called differently in my models (messageid for example, in another table i was trying to just call it "token" as i was storing dashboard access tokens and that name made more sense). trying to name that anything else then "id" made things freak out
Er, yes. This is a feature that I have been deferring for a while, as my attempts at getting this to work perfectly opened a can of worms, that at the time I felt was not a priority. Currently it will use a single connection and shove as much through there as possible. If you want to help out here, you are welcome to :-) It was one of the reasons we decided to drop Py3.5 support as it was limiting us in using some asyncio features fully.
i might take a look at some point, for now i think i'll focus on finishing the orm swap and cleaning some of the legacy code involved (as some well as making things a bit more mobile friendly (infractions will be the real test of what tortoise can handle) Though once i get a better grip on tortoise i'd def be up for looking into if i can help make that possible
Yay! But lets fix these quirks for you thanks a lot
i'll see to get all the quirks retested after class and see if i can assemble a basic case for testing and demo'ing some of the things
it is though the biggest problem atm is just that all database interacting code needs to be replaced, retested and i'm cleaning up some legacy messes along the way (like the database using int instead of timestamps for storing times)
Hence your need for a migration system.
https://github.com/tortoise/tortoise-orm/blob/develop/tests/testmodels.py#L413 is what i'm talking about, the field is named "id" in code. while it's probably best practice to do so, i had a few fields called differently in my models (messageid for example, in another table i was trying to just call it "token" as i was storing dashboard access tokens and that name made more sense). trying to name that anything else then "id" made things freak out
If "id" is a problem, that is a bug. I'll look into it.
i'll see to get all the quirks retested after class and see if i can assemble a basic case for testing and demo'ing some of the things
That would be super :+1:
alright looking at things again it seems most issues are either fixed in newer releases or logged already on your field refactor issue. a few things of note though:
- would it be possible to add a note to the docs warning people that if you use an intfield primary key, that you need to also pass the "autogenerated" param to false if you want to be able to set it yourself? if you don't it just silently ignores whatever you pass for that field (or maybe log a warning?)
Of course. you mean generated=False
?
- foreign keys are added now and the warnings related to it are gone
Yes, I'm sure we changed that recently :slightly_smiling_face:
- passing the echo param in the connection string is having some unintended sideffects and doesn't seem to work. i would have only expected this to cause it to start logging all the queries it does but i don't see any logging showing up from tortoise, it however is causing debug logs to show for other libs, tortoise is setting that one to debug level instead of it's own logger
The echo param is passed through to the aiomysql
driver, which passes it to pymysql. Tortoise itself doesn't currently handle that directly, Maybe it shoud? If you allow the namespace of "db_client" to emit debug (via a complex log config) you should see what is emitted via Tortoise itself.
- is there a way to tell tortoise only to generate a specific table? or will it try to generate all of them? i only see one warning logged about a table already existing
Not right now. There is a lot of scope between our current naïve implementation and full migrations. I'm open for any feature request of better schema generation.
Of course. you mean generated=False?
exactly what i ment sorry, still getting used to the names, i'll open a PR for this
Yes, I'm sure we changed that recently 🙂
in a very good way, thanks
The echo param is passed through to the aiomysql driver, which passes it to pymysql. Tortoise itself doesn't currently handle that directly, Maybe it shoud? If you allow the namespace of "db_client" to emit debug (via a complex log config) you should see what is emitted via Tortoise itself.
well my main issue atm is that however that is doing it, it's messing with other loggers causing giant debug spam
Not right now. There is a lot of scope between our current naïve implementation and full migrations. I'm open for any feature request of better schema generation.
being able to tell tortoise exactly what tables to generate would be nice, maybe it trying all of them instead of quitting after the first one fails would be nice as well. this could then act to make sure all tables exist. i won't do migrations for you, but for something as "simple" as just a single new table for a new feature wouldn't require specific migration anymore
I did a lot of testing, and the concern re habing to use "id" for the PK field name if the field has a custom source_field or not isn't happening for me. It may be something that got fixed in the later releases?
tried and can no longer reproduce it, must have been fixed as a side effect of something else i guess
below are a few things related to trying to use tortoise, not sure if i should split this up into even more issues but i'm not sure all of them are even bugs
some background: i'm currently using peewee but that is syncronous while my use case (gearbot, a fast growing discord chatbot) is an asyncronous environment and as one table in particular is growing to sizes of over 7GB and new features being considered that might require full tablescans async is def required to not block everything
database used atm is mariadb (people recomended postgress but not with any good reasons, it seems tortoise has better support for that but unless there is a good performance benifit i don't see the point in migrating a database this big)
1) tortoise seems to really not like you naming your pk anything other then
id
, ranging from fields not being found (when trying to name it id in code but make it use a different name in the to issues with it thinking there are multiple PKs 2) while i have not tried in the latest version fk relations where not being made, i had to correct that manually in the database, tortoise works perfectly fine without them but the database performance suffers as a result of the missing indexes 3) i can't find any info on the pool size and according to the docs i can't set this myself either. this is especially noticable in high volume writes. while this isn't really gona happen often in my application, when running a bunch of write operations in paralel on a the same table, it was only a little faster then running them one by one. part of that might be the write locks, but as size grows it might be nice to have more then 1 connection to the database to speed things up. the server also only reports 1 connection being opened 4) bigint fields are created as regular int fields in the databaseall in all i do find tortoise easier to work with when querying and especially since it's a sync and won't be holding up the loop, just has a few weird quirks
sorry for the wall of text