mozilla / ssh_scan_api

An API for ssh_scan (https://github.com/mozilla/ssh_scan) and the backend API service for the Mozilla SSH Observatory (https://observatory.mozilla.org/)
31 stars 12 forks source link

[BETA] Migration to ActiveRecord-based Datastore #137

Closed claudijd closed 6 years ago

claudijd commented 6 years ago

Doing direct PG.connect calls really just isn't tenable after mucking with it for about a day or so. I'm coming to the conclusion that the API and it's DB/Store connectivity needs to be more abstracted and I feel like their has been entirely too much time spent in the weeds on this. Going to move over to using ActiveRecord as our DB interface, which supports all sorts of useful DB tech, like MySQL, PostgreSQL, SQLite, SQL Server, Sybase, and Oracle (all supported databases except DB2), should we decide to make a change our selves or if someone would prefer to use a different DB tech than us.

claudijd commented 6 years ago

ALPHA testers can follow instructions here:

https://github.com/mozilla/ssh_scan_api/wiki/Deploying-ssh_scan_api-using-docker-compose

claudijd commented 6 years ago

@caggle thanks for the quick review!

claudijd commented 6 years ago

@rishabhs95 you are right, adding the whole of ActiveRecord introduces a lot more underly vendor code and perhaps gives us less "raw" performance on queries. That said, the way the infra is designed, I don't think we'll be bottle necking on any one area in particular, though I haven't perf tested any of this just yet. What I will say is that porting DB tech from mongodb to postgres was sort of a chore and were approaching (not yet hitting) performance walls in mongodb, which mainly driven by the /stats interface and the speed at which we could easily do data mining into the results.

One thing I do like, and you've hit on this point, is the ability to more easily switch to an alternative DB tech if we decide postgres isn't for us. I will say in some of the poking I have done, depending on how want to layout the database schema we can make it very postgres proprietary (like storing arrays and such) or we can keep it relatively generic allowing easy DB portability if someone really prefers mysql over postgres.

I suppose if ActiveRecord turns out to be a bottle neck here that would really suck, because this port is very "all in" with ActiveRecord and to get out, it would require another large set of changes like this. I'm really hoping it won't be (might require some testing to hedge our bet), but for the sake of simplicity in use, it's really slick. One of the things I noticed right off was the ability to temporarily lock records and roll back changes in the case we trip an exception mid-transaction and that is something we didn't have in the mongo setup, which resulted in a lot scans getting caught in running state.

As you might be able to tell by my comments, I'm still not 100% sure with this direction, there are some known pros and some possible cons we've yet to trip up on or measure. I'm actually a big advocate for experimentation on this type of stuff, so you want to play with it and see if you can measure the cons, I'm more than happy to participate and be learned up. My plan is to land this sometime in Q1, assuming it's all systems go.