tl-its-umich-edu / unizin-validation

Unizin Validation Scripts
1 stars 4 forks source link

Refactor management of database connections #24

Closed ssciolla closed 4 years ago

ssciolla commented 4 years ago

This PR modifies how validate.py manages the database connections it needs to perform queries. Instead of opening and closing a connection when running each query, all needed database connection objects are set up at the beginning of the script and stored in a simple dictionary, which is then passed up to the functions that need those objects. All open connections are then closed after queries have been executed and data processed. This PR aims to resolve issue #21.

ssciolla commented 4 years ago

When we're passing variables around this, it really makes it seem like we should refactor this into a class instead. Seems like a good idea to initialize these but I think it would be even better as a class because then you could do the setup in the init and the close in the del

Okay, so what would be the name of this class? DBConnManager? Would establish_db_connection become a method? I thought about creating a class before to help handle different connection patterns (like the one used with BigQuery).

jonespm commented 4 years ago

I'd probably just make it simple and put all of these Validation methods into a class just called something like Validate or ValidateUnizin. I don't know if we need two classes in here, maybe you could do that too. But just if we had this ValidateUnizin class created when the loop was running through processing all of the code it would retain the initialized databases.

ssciolla commented 4 years ago

@jonespm, okay, I refactored the database management into a class. I'm not sure I understand why it's necessary or advantageous to rewrite the entire script as a single class. It seems more like a process than a discrete thing (maybe a job is?). This might get into a larger debate about using functions versus classes, but I agree that the database management seems to be encapsulated well by a class. Thoughts?

jonespm commented 4 years ago

It's just that you're passing this new database manager class into every method as a parameter. When you are frequently passing around an object around it makes more sense to have that object as a local variable within the class and either just passed as part of the init or created as part of the init.

I think that my idea was that this would all just be a class and the init would get the database and store it locally.

And the method on that class execute_query_and_write_to_csv(query) would just run those queries outside of this. I think this is fine though.

jonespm commented 4 years ago

@ssciolla We good to merge this one?

ssciolla commented 4 years ago

@jonespm, sure, yeah, I was mulling on the object-oriented stuff for a while and then this became less of a priority. Let's merge for now, and we can improve it later if we want.