pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.78k stars 298 forks source link

Integration with SQLAlchemy #331

Closed vladiscripts closed 4 years ago

vladiscripts commented 4 years ago

Could you add improved integration support with SQLAlchemy? Because now have to duplicate the code, violating the DRY principle. I mean:

To example:

Base = declarative_base()

class Table(Base):
    __tablename__ = 't_name'
    id = Column(Integer, primary_key=True)
    date = Column(Date, nullable=False, unique=True)
    total = Column(Integer, nullable=False)

db_name = 'db_name'
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{db_name}', echo=True)
Session = sessionmaker(bind=engine)
session = Session()

Base.metadata.create_all(engine)

db = dataset.connect(str(engine.url), engine_kwargs=dict(echo=engine.echo))

def insert_to_Table(rows: Union[list, tuple]):
    table = db[Table.__tablename__]
    for row in rows:
        d = dict(zip([Table.date.key, Table.total.key], row))
        table.insert_ignore(d, Table.date.key)

How can see. Now have to separately use SQLAlchemy first to create a table model. Then call Base.metadata.create_all(engine). Also keep Session() for more complex cases #322.

Then do duplication of the engine defination. Here I would really like to add the engine parameter to the dataset.connect()to use an existing one.

All this does in the script duplication of DB connections and confuses the script if several developers do it.

Then, again, have to use the reference to the models db [Table .__ tablename__], dict (zip ([Table.date.key, Table.total.key], row)).

pudo commented 4 years ago

Hey @vladiscripts can you help me to understand your use case a bit better? Is it the case that you have half of your database schema managed by SQLAlchemy ORM and half by dataset? Or are both really writing the same tables? If so: once you've mapped your database to SQLA ORM, what do you need dataset for? (It surprises me because the ORM is much more powerful than dataset).

If all you're proposing is to be able to pass an existing engine into dataset.connect() that's probably a good idea, but I don't know if you wouldn't experience more mis-match down the road. Sharing a MetaData between the ORM and dataset, in particular, feels extremely unsafe.

pudo commented 4 years ago

Closing this, no feedback.

vladiscripts commented 4 years ago

Sorry, was very busy and couldn't make time.

The problem is that SqlAlchemy ORM does not have a ready-made set of necessary methods - insert, mass insert, insert_ignore, update, upsert. They are needed in every project, you have to reinvent the wheel every time. Each of these methods is about a dozen lines, as opposed to one command as done in 'dataset'. This is a problem if you are a professional Python developer and create a new project every couple of days ... I think that dataset was written for the same reason.

In general, I wrote for myself a private some analogue of dataset for SqlAlchemy ORM. - That is, a class with a set of these methods.