tortoise / tortoise-orm

Familiar asyncio ORM for python, built with relations in mind
https://tortoise.github.io
Apache License 2.0
4.5k stars 368 forks source link

Tortoise orm's expand_db_url() throws ValueError("Invalid IPv6 URL") for my db_url #404

Open Vishnu-priyan opened 4 years ago

Vishnu-priyan commented 4 years ago

Describe the bug

tortoise-orm's expand_db_url function in config_generator.py throws ValueError("Invalid IPv6 URL") when trying to parse db_url.

expand_db_url function has, url = urlparse.urlparse(db_url)

Source code of urlparse (which uses urlsplit, see this line of code)

When given a db url with either one of these special characters '[' or ']', urllib's parse tries to parse and raises it's invalid IPv6 URL error.

To Reproduce

from urllib.parse import urlparse

cs = 'mysql://some_user:ADM[r$VIS]@test-rds.somedata.net:3306/mydb?charset=utf8mb4'
sms = 'mysql://fail_user:DMK_15[ZWIN6@test-rds.somedata.net:3306/mydb2?charset=utf8mb4'

print("This runs...")
urlparse(cs)
print("Fails...")
urlparse(sms)

Expected behavior Expected is to allow this password with special characters.

https://github.com/tortoise/tortoise-orm/blob/72f84f0848dc68041157f03e60cd1c92b0ee5137/tortoise/backends/base/config_generator.py#L62 https://github.com/tortoise/tortoise-orm/blob/72f84f0848dc68041157f03e60cd1c92b0ee5137/tortoise/backends/base/config_generator.py#L63

This shouldn't be happening. I tried the same with sqlalchemy ORM please have a look at the below code, mydatabase.py

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

_db_url = 'mysql://fail_user:DMK_15[ZWIN6@test-rds.somedata.net:3306/mydb2?charset=utf8mb4'
engine = create_engine(_db_url, convert_unicode=True)
db_session = scoped_session(sessionmaker(autocommit=False,
                                         autoflush=False,
                                         bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()

def init_db():
    # import all modules here that might define models so that
    # they will be registered properly on the metadata.  Otherwise
    # you will have to import them first before calling init_db()
    import models
    Base.metadata.create_all(bind=engine)

models.py

from sqlalchemy import Column, Integer, String
from mydatabase import Base

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(50), unique=True)
    email = Column(String(120), unique=True)

    def __init__(self, name=None, email=None):
        self.name = name
        self.email = email

    def __repr__(self):
        return '<User %r>' % (self.name)

Terminal:

>>> from mydatabase import init_db
>>> a = init_db()

Above works without any issue.

After digging through the source code for sqlalchemy they use this make_url following RFC 1738.

Ideally the above password should be allowed, it is an unexpected error here. Sqlalchemy connects with db, whereas when using tortoise orm error occurs.

I'm working on fixing this db url parsing logic for tortoise orm (if it is an error correct me otherwise). or is this error expected in tortoise orm and let it be. Please advice. :)

Python 3 for urllib say

Unmatched square brackets in the netloc attribute will raise a ValueError.

Additional context Please note, I understand that this is because of the password in db url and just posting this for further discussion. Also somewhere down the line, somebody hopefully stumbles across this discussion when they face this IPV6 error using tortoise orm.

grigi commented 4 years ago

Agreed, our behaviour seems wrong here. That behviour of urllib is suprising. We should do some form of fuzz testing for the username/password to ensure we parse it correctly.

Vishnu-priyan commented 4 years ago

Alright. What could be the fix for it?, Can we modify sqlalchemy's URL parsing logic here for tortoise orm? Or any other way?. Urllib parsing will throw this error, couldn't find a way to ignore that netloc check. It raises ValueError. Users might not be aware of this behaviour and might find their production app causing error, but cause being a dB password. (This what happened for me).