JSON/Database Design - Githubissues

Just opening a thread on the JSON/Database design for feedback, comments, and ideas.

This is just in regards to the back end, so setting the UI aside for the moment.

Eos doesn't use a database, it grabs all it's data from JSON files and creates a cache file out of them.

Specifically, these are the ones it cares about:

     def get_evetypes(self):
        return self.__fetch_table('evetypes')

    def get_evegroups(self):
        return self.__fetch_table('evegroups')

    def get_dgmattribs(self):
        return self.__fetch_table('dgmattribs')

    def get_dgmtypeattribs(self):
        return self.__fetch_table('dgmtypeattribs')

    def get_dgmeffects(self):
        return self.__fetch_table('dgmeffects')

    def get_dgmtypeeffects(self):
        return self.__fetch_table('dgmtypeeffects')

    def get_dgmexpressions(self):
        return self.__fetch_table('dgmexpressions')

Since we are already going to be including JSON files for Eos (I'm assuming we're not going to spend the time to rewrite that), what else can we leverage those JSON files for?

Do we just want to just use JSONs for the back end, and skip using SQL for static data entirely? (This would certainly make updating the data much easier than trying to update an existing SQL database.)

Right now for the back end we have the following modules:

Eos pyCrest sqlAlchemy Flask

Since Flask can read the JSONs directly (and load them into memory), I'm not sure what duplicating that data into SQL would buy us.

User Database Design

For the user database, most of what we currently do can be simply migrated over, the basic design hasn't changed much.

For Eos, we need to store information about:

fits
modules
rigs
drones
implants
boosters (drugs)

pyCrest

Crest data is special because any of it can be dynamically updated, and it might be user data, it might be market data, it might be static data, or it might be something else entirely different.

Do we want to simply use pyCrest to build/update JSON files, and then read the JSON files directly with Flask?

Do we just want to just use JSONs for the back end, and skip using SQL for static data entirely?

This is certainly an idea. However, SQL gives us the benefits of SQL queries (while SQLAlchemy gives us the benefits of an ORM). These are very powerful, and without them we would have to load the data into memory and loop over it to find the information we are looking for. This compounds the more data you load related to a record, whereas SQL ORM is quick and easy. I would much prefer to keep with a SQL database and not use plain old JSON as the main data store.

Since we are already going to be including JSON files for Eos (I'm assuming we're not going to spend the time to rewrite that), what else can we leverage those JSON files for?

My thought was this: we don't ship the json files, but instead simply ship the cache. Not only that, but I believe (could be mistaken, haven't looked in a while) that EOS can also read data from a SQLite database (and can probably be fanagled to read from a MySQL database as well), so for the purposes of pyfa, it would be json dump of data > sql database. This is shipped, and upon launch of the application the first time eos takes the time to cache the data from the sqlite database.

Another thing to note: while I have before been adamant on not shipping the SDE with pyfa (and still am considering it contains data we just don't need), we ought to consider at least being compatible with the SDE schema. I don't yet know the implications of this moving forward (with respect to updates / handling what is essentially data that isn't explicitly shipped with pyfa), but this could allow third parties that use the pyfa server to simply point it at their current SDE implementation. But then again, that would introduce a disconnect - if they upgrade the SDE but not pyfa, then pyfa is broken. If they upgrade pyfa but not the SDE, then pyfa's broken.

So scratch that idea, I think we ought to explicitly point to pyfa-supported data only. This opinion is supported by the fact that we may be using CREST data that the SDE simply doesn't support, so having our own schema that we control is in our best interests (and it would also alleviate versioning issues).

I was also just thinking - since we're going to (eventually) support remote connections, we need to discuss the way data is transferred, and which data is transferred.

With a web application, it's pretty simple: the page is static, and it will get ALL information (both user and eve data) from the server. But, branching out to a desktop application, there are two scenarios:

Connecting to the local server: This shouldn't really present much problem because there will be no network latency.
Connecting to a remote server: If you are making changes to your fitting, then that needs to be transmitted over the wire to the remote server as they are the ones controlling the data. However, if you're browsing the item market, I would rather get that information from the LOCAL server rather than the remote server so that network latency isn't an issue. This really should be too difficult - for anything involving EVE data, simply point to the local server rather than the remote server, but I wanted to get this point out there so that maybe someone can see holes that need to be filled.

This is something to think about, doesn't need to be solved for immediately. :)

My thought was this: we don't ship the json files, but instead simply ship the cache.

Eos won't like this.

Every time Eos runs, it checks to see if the JSONs match the cache. If it doesn't, it updates the cache. I moved the JSON files (didn't change them or anything, just location), and it triggered a rebuild of the cache. I then tested moving the files and leaving the cache, it bombs out straight off.

So we're going to need JSON or possibly SQLite (and JSON is more web app friendly than SQLite is, I think).

Not only that, but I believe (could be mistaken, haven't looked in a while) that EOS can also read data from a SQLite database

The SQLite connection seems to be hard coded in, it's not using sqlAlchemy or something else. So switching it over to support MySQL, MSSQL, Postgresql, DB2....not quite as easy as if sqlAlchemy (or similar) was used. The reading from JSON/SQLite is only used to build the cache anyway.

Also, Eos will only read from the cache (@Regner actually already looked into this since he'd prefer it not use the cache for web deployments). It's not going to be easy to rip that out (and probably not worth the time).

We should change how we pull data in to create the cache, I think, because web deployments probably won't want to use JSONs. I mean, they can but it'd be better to use SQL. But that's a future thing. I think that Eos shouldn't be pulling from SQL directly anyway, if we can point it back to Flask and let Flask handle it through SQLAlchemy (or JSON files) then we have a single point for our data source rather than multiple points.

But right now we're stuck with what we have, I think. It's just not worth fiddling with right now, since it'll be fully functional without changing it.

Another thing to note: while I have before been adamant on not shipping the SDE with pyfa (and still am considering it contains data we just don't need), we ought to consider at least being compatible with the SDE schema. I don't yet know the implications of this moving forward (with respect to updates / handling what is essentially data that isn't explicitly shipped with pyfa), but this could allow third parties that use the pyfa server to simply point it at their current SDE implementation. But then again, that would introduce a disconnect - if they upgrade the SDE but not pyfa, then pyfa is broken. If they upgrade pyfa but not the SDE, then pyfa's broken.

So scratch that idea, I think we ought to explicitly point to pyfa-supported data only. This opinion is supported by the fact that we may be using CREST data that the SDE simply doesn't support, so having our own schema that we control is in our best interests (and it would also alleviate versioning issues).

Sorry for quoting a wall of text, but you got a lot going on here.

We have different use cases going on here, and I think you're lumping them all in together.

For Eos, we're stuck with whatever structure Kadesh setup, which seems to just be what gets dumped by your little tool. That doesn't exactly match what's in the SDE, but that doesn't really matter, because Eos is going to convert it to a cache anyway (and does a massive amount of cutting, which is why we go from over 100 MB of JSONs to 172kb...which that number seems WAY too small, but that's another topic).

Now, I did run a quick test, grabbed some item data off Crest and dropped it into a JSON file. Deleted the cache, and Eos recreated it. So it seems that Eos does support Crest style data, but obviously that needs to be fully tested.

For pyCrest, most (if not all) of the data that we will pull down will be in a format similar/matching to the SDE. @Regner took a look, and the majority of the data Eos uses can be pulled from Crest. So assuming that we want to use Crest data (and there's no real reason not to), we should match that format wherever possible.

If we go full in on supporting Crest, and matching that format rather than rolling our own, then it doesn't take an entire rebuild of Pyfa just to get new data. That's a huge load off the developers, and in a Hit-By-A-Bus scenario Pyfa will keep functioning rather than going the way of EFT (which is over 6 months old out of date now).

You don't want that, now do you? :)

Okay, so far we have:

Eos (which seems to support Crest JSONs)
pyCrest (which will generate Crest JSONs)
Flask (which will use whatever we tell it to)
sqlAlchemy (which will use whatever we tell it to)
UI [Angular/Reactive/whatever]

So we're down to the UI.

First off, mostly we just need the invtype table/JSON. Everything else is pretty much just linking things together, and is pure IDs, and can be whatever structure we want.

And going back to my earlier statement:

We have different use cases going on here, and I think you're lumping them all in together.

We already have at least 3 people lined up who are wanting a RESTful API (is Flask done yet @Regner ?). They don't care about the UI, and probably don't even want it included in the package (but that's another topic).

They aren't going to want to be limited to waiting on Pyfa to release an update, just so they can run it as a service. They're going to want to be able to update it on demand, and I think it's a huge mistake to work against what is potentially a very big user base. (Think of how many users Pyfa will have if squizz implements it on zkill!)

But then again, that would introduce a disconnect - if they upgrade the SDE but not pyfa, then pyfa is broken. If they upgrade pyfa but not the SDE, then pyfa's broken.

Not necessarily.

Just like we now actually have separation between Eos and the GUI, we should have separation between our code and the data. We should validate and clean our inputs and outputs (especially since we're adding a HTTP API, we don't want any Little Bobby Tables here) and should throw exceptions if it fails to prevent GIGO.

Even if we didn't do all that (which, again, I think we should), the vast majority of the time (like 90%+) there's nothing in the data that will break something. Most of the time it'll be functionality that's not implemented (net new things) rather than old functionality that's modified that breaks things.

To use an example here, take the citadel patch. Assume for the sake of argument that we simply treated Citadels like ships, so they were visible as soon as the Crest endpoints updated.

Joe User submits an issue. "When I try and open a Citadel to fit it, Pyfa throws an error that there was a problem." Nothing else is broken, and Joe User can use Pyfa just fine.

Now lets fast forward a few months, and the mining barge re-balance happens.

Squizz triggers an update of the database/JSON. All the mining barge stats are updated, everything works. Unfortunately the new PANIC module doesn't do anything (you can equip it, and it has stats), because CCP did new functionality. But he doesn't have to wait for a new release of Pyfa to get new data.

TL;DR

For the backend service we should use a data format that matches Crest. If the data is not available in Crest but is in the SDE, we should match the SDE. If it's not available in either, then we do the best we can.

For the UI, it doesn't matter, we're in control of it so we can use whatever we want.

I was also just thinking - since we're going to (eventually) support remote connections, we need to discuss the way data is transferred, and which data is transferred.

Which data are we talking about here? :)

With a web application, it's pretty simple: the page is static, and it will get ALL information (both user and eve data) from the server. But, branching out to a desktop application, there are two scenarios: Connecting to the local server: This shouldn't really present much problem because there will be no network latency. Connecting to a remote server

Why does it have to be different?

So setting aside genuine web app scenario (most of those users will be ditching our UI and just connecting to the service directly), we have basically 2 platforms, and potentially 2 scenarios.

For our platforms, we want to support desktop (Windows/Linux/Mac) and mobile (Android/iOS).

For our scenarios, ideally we should support local (service running locally) and remote.

Our UI should be completely separate from the backend service.

For local use, on Windows/Linux/Mac/Android this is trivial, the backend is run as a service, and we launch the GUI which connects to the service via RESTful HTTP. (iOS I'm not sure of.)

Now it'd be really slick if the UI could connect to the backend service running on another computer. So I install Pyfa on my desktop, then on my Android phone plug in my computer name/IP, and away I go.

But we're just going to connect via a RESTful HTTP connection. The only thing different is that it points to 192.168.0.201 instead of 127.0.0.1.

However, if you're browsing the item market, I would rather get that information from the LOCAL server rather than the remote server so that network latency isn't an issue.

On a modern network the amount of data we're transferring is trivial. If we were talking about gigs/s, or megs/s, or even just hundreds of KB a second I'd agree with you, but even if we transferred all of the raw JSON files we're talking about a MASSIVE 100 megs (seriously though 100 megs is not large). Add in the user data, and we're probably up to 101 megs (yeah, the user database is tiny).

If we're transferring what Eos stores in it's cache, then we'ere talking about well under a meg of data (maybe a full meg if we dump the entire user DB).

Realistically we will have a tiny amount of data to transfer, so doing it live over the wire is no big deal.

Additionally, we will also need to cache it local to Angular/React anyway, so we can properly display the trees and whatnot. So it's pretty simple when the app launches to request the data from Flask, cache it locally, and away you go. It might add a few milliseconds to the first load time, but it's not going to be much. As long as we don't have users on 56k modems, we'll be fine. :)

Okay, that's a large enough wall of text.

I think a lot of this has been cleared up in Slack discussions, namely the fact that it's inadvisable to have EOS loop back and talk to our API endpoint to get the data it requires.

Now, I did run a quick test, grabbed some item data off Crest and dropped it into a JSON file. Deleted the cache, and Eos recreated it. So it seems that Eos does support Crest style data, but obviously that needs to be fully tested.

This is most certainly broken. EOS was designed specifically for the output defined by phobos, which is how the data is laid out in the EVE cache. CREST certainly has the same fields, but EOS by no means supports CREST.

I'd rather it remain that way, TBH. EOS needs 1 thing: data. that data currently comes from JSON or SQLite (which can be branched out to MySQL later on).

When I say CREST dumper, we really should have a dumper that produces the SQLite database that is loaded into EOS, rather than have EOS directly get data from CREST. This gives us the benefit of having to do one dump to compile the EVE data source, which EOS then uses to compile it's cache.

First off, mostly we just need the invtype table/JSON. Everything else is pretty much just linking things together, and is pure IDs, and can be whatever structure we want.

Uh, no? We also need categories, market groups, attributes, effects... pretty much everything in the eve database is needed by the UI in one form or another.

They aren't going to want to be limited to waiting on Pyfa to release an update, just so they can run it as a service. They're going to want to be able to update it on demand, and I think it's a huge mistake to work against what is potentially a very big user base.

My priority is to what pyfa itself needs as an application. the eos-http package that @Regner is working on is wonderful and will help developers that need a fitting service, but we also need to put pyfas needs above other developers. That being said, after thinking about it more, and if the eos-http package is all the other developers need, it should be pretty easy to compile eve data to a schema that matches the SDE schema, which could keep the third-parties happy (they can hook up to whatever data that have). After all, EOS only needs data that is in the SDE (for now, but that may change).

However, if a third-party wants to host a pyfa server (not eos), then I think it's best to develop our own schema that works best for us and pyfa the application, rather than try to support the SDE as well, considering our data may be coming from multiple sources (sde, crest, any new data source down the road)

Why does it have to be different? Simply due to latency. If I'm on a desktop version of pyfa connected to a remote service, a local eve database is already on my machine. Why would I need to connect to the remote api and deal with latency issues (search for a module, for example) when I can just connect to the local api and have pretty instant data. not to mention, if we're gonna be running our shit in the cloud, OUTBOUND data is usually the charged data. Now, it's a pretty trivial cost, but I'm a cheap motherfucker. :D

It's just an efficiency thing, certainly not something we need to be concerned about at the moment.

If we're transferring what Eos stores in it's cache, then we'ere talking about well under a meg of data (maybe a full meg if we dump the entire user DB).

I'm not sure I follow you. The EOS cache will not help in any user interaction. It's literally a bunch of integers compiled in a way to make lookups fast and calculations faster.

I think a lot of this has been cleared up in Slack discussions, namely the fact that it's inadvisable to have EOS loop back and talk to our API endpoint to get the data it requires.

This was mostly me thinking of Flask more like Django, where it's a full fledged framework instead of simply a HTTP endpoint.

In that context, it doesn't make sense to have Eos talk to Flask for data, but I'm still of the opinion that it's beneficial to us for many reasons to have centralized management of our data connections. But we can look at that as it comes up.

My priority is to what pyfa itself needs as an application. the eos-http package that @Regner is working on is wonderful and will help developers that need a fitting service, but we also need to put pyfas needs above other developers. That being said, after thinking about it more, and if the eos-http package is all the other developers need, it should be pretty easy to compile eve data to a schema that matches the SDE schema, which could keep the third-parties happy (they can hook up to whatever data that have). After all, EOS only needs data that is in the SDE (for now, but that may change).

The eos-http package is a very simplified version (or possibly a precursor of what we need) for our UI will also use. Since our UI will just be doing RESTful queries to pull data (assuming we go with Angular/React/similar), then it's all just pure HTTP API communication.

Or am I misunderstanding how you wanted to design it?

I see developers using this in a few potential ways.

Using Eos directly. Eos is just a fitting engine, so they need to develop their own framework, data management, UI, etc. Unless someone already has a Python project, I don't see too many people going with this option, but it's trivial to support it...because it's already done thanks to @Regner being awesome.
Using the Pyfa service (backend). Since it's a RESTful HTTP API, it becomes trivial to integrate this into an existing UI, regardless of what language it is. This is the option that Tamber and Squizz want, and it's basically what @Regner has started with eos-http. There's more to it than that of course as we still need a framework to connect all our modules together, but in the end it's a nice neat little package.
Using the Pyfa service (backend) and Pyfa UI. I honestly don't see this as being very viable, unless someone wants to roll Pyfa inside an app (as a sort of suite).

If our backend service can point to any data source, as well as being integrated into Crest (what you keep calling the "crest dumper," which sounds like something that involves toothpaste and a porta-potty), then it doesn't really matter what format the data comes in, we'll translate it to whatever we need, and all that is transparent to the developer using our framework. They feed a fit into our API, black box magic happens, and they get an output with stats.

I just want the setup to be as simple as possible, for us as developers, for users of the whole Pyfa product (UI and back end), and 3rd party developers like Squizz/Tamber.

Using Eos directly. Eos is just a fitting engine, so they need to develop their own framework, data management, UI, etc. Unless someone already has a Python project, I don't see too many people going with this option, but it's trivial to support it...because it's already done thanks to @Regner being awesome.

No problem with this

Using the Pyfa service (backend). Since it's a RESTful HTTP API, it becomes trivial to integrate this into an existing UI, regardless of what language it is. This is the option that Tamber and Squizz want, and it's basically what @Regner has started with eos-http. There's more to it than that of course as we still need a framework to connect all our modules together, but in the end it's a nice neat little package.

no problem with this, but I don't consider this to be a pyfa service since it would only rely on eos, not on pyfa-specific things. A pyfa service, to me, would be something that requires our server and our set up (database schema). For third parties, this is simply an EOS service that they have an endpoint to. I'm being nitpicky with the terminology :P

Using the Pyfa service (backend) and Pyfa UI. I honestly don't see this as being very viable, unless someone wants to roll Pyfa inside an app (as a sort of suite).

This would basically be used on alliance websites so that they can self-host their own fitting thing.

I'm not sure where this discussion is going since it seems that we are in agreement on most things. This is still confusing though:

If our backend service can point to any data source, as well as being integrated into Crest (what you keep calling the "crest dumper," which sounds like something that involves toothpaste and a porta-potty), then it doesn't really matter what format the data comes in, we'll translate it to whatever we need, and all that is transparent to the developer using our framework. They feed a fit into our API, black box magic happens, and they get an output with stats.

Again, my thought for the flow of data is this:

1) We dump the data from whatever data sources we need and compile to SQL. This gives us a standardized data source 2) EOS consumes this data and creates a cache 3) Profit

That would be done during the build and deploy process. This allows us to have a SQL database with the data that can be used by EOS to create it's cache, and also used by the framework to support the other functions (serving the user type data and whatnot)

Also, EOS does support multiple database versions. This is what the SourceManager is (I think that's what it's called). It would allow us to support, say, the latest patch, but also support other eve data versions (so you can see how a fit has changed with respect to patch). O.smium has this. I think we should to, but definitely not out the gate. That would require a much bigger discussion on how to handle it. Just wanted it noted.

pyfa-org / pyfa-ng

JSON/Database Design #2