mixnjuice / project-ideas

This is an attempt to create the best, most user-friendly mixing website the world has ever seen :)
9 stars 2 forks source link

RFC#7 Hook/Ledger for Data Sync with Third Parties #27

Open Korlimann opened 5 years ago

Korlimann commented 5 years ago

TBD

pscn commented 5 years ago

FTR: ATF Flavor Table Layout

pscn commented 5 years ago

Short question so I can better understand the scope of this: Right now I think this hook / ledger is independent from the backend and is just some kind of interface between us and the third parties, correct?

I'm trying to define a GraphQL schema right now (see schema.graphql) and one of the key points right now, is that every modification to vendors, concentrates, recipes etc. is tracked in the system. Depending on the modification it might require approval by the community. In my current mindset I would think, that modifications from third parties are entered just like modifications from other users. So for example for every third party that can update us, we have a user account that does that modification. If ATF adds a flavor / concentrate, the ledger calls the API function (aka mutator) to add it.

Am I wrong? Is this to complicated? What would be a better / smarter / ... method?

ayan4m1 commented 5 years ago

What you're doing sounds like the right way to use GraphQL, but I think the problem you're trying to solve is bigger than just that - how do we decide which transactions are trustworthy and which are not? Reputation score, auto-reconcilation from major partners, dare I even say the word "blockchain," are all ways to answer that question. But it is ultimately a complex problem with no one solution.

At a low level, I think we have all been aligned in wanting some kind of transaction log into which changes from external sources would flow and out of which updates to our data store would be made. To that end, the transaction log/ledger application would be doing much the same thing in either a GraphQL or RESTful model - the ledger has its list of 500 updates to make to our data, it connects and calls either GraphQL mutators or REST resources with the appropriate verbs. I am unsure if there are any implementation difficulties specific to either one of these choices, but I know that GraphQL can be faster and more efficient while REST is much more likely to be supported on esoteric platforms / embedded systems. If, for example (totally not something I am planning on doing), you wanted to make an e-ink based display that fed you mixing info during a session, you'd have a much easier time of doing it if we provided a REST API. I realize that is a very edge case, but it'd be much harder to start off with GraphQL and find that we needed to abandon it compared to starting with a REST API and then deciding that, as an enhancement, we can provide a GraphQL layer on top of or alongside that.

As for right now, unless we're planning on having GraphQL be the primary means of interacting with our API, I'd say we should design around REST patterns - verbs, collections, and resources. Don't let me in any way stop you from exploring GraphQL for this project!

pscn commented 5 years ago

As discussed yesterday, REST is totally fine. I don't see me having the time to dive deeper into GraphQL in the near future.

As for who's trustworthy or not, I'd say everyone outside our realm (like the ledger) would be untrusted and mechanisms to ensure nothing gets "destroyed" need to be in place. I'm thinking about a system like musicbrainz has for adding albums and artists. Adding a new artist / album generally is done without any further checks. Modifying or removing however either require votes of other members or a certain time period to pass. Pending modifications are visually highlighted.

I'm not sure that's the best way to handle it, but right now I didn't find a better way to stay between ELRs chaos and ATFs restrictedness. I'm totally open for suggestions.

Link to musicbrainz documentation

ayan4m1 commented 5 years ago

I think an amalgam of automated and manual features will be the best way to keep overall data quality high. We should enumerate strategies to deal with vetting/validation of user-submitted / crowdsourced data (which @pscn has already started here) - some that come to mind:

Also, worth noting that MusicBrainz also uses various checksum-based approaches to deduplicate a lot of the bad (duplicate) data that they get. We can technically checksum/hash a recipe... but that seems like a silly solution to the broader problem of moderation/change management and controls for our data.

My proposal is for a sandbox / multi-environment approach where the "live" data is tested in a lower environment and then promoted to production via a combo of automated and manual review steps. There would be some designed-in delay in the promotion process so as to allow changes to take place at the rate we feel is appropriate for the userbase and our server resources.

pscn commented 5 years ago

Can you elaborate on the sandbox / multi-environment approach with an example. I have a hard time imagining this.

To clarify: Though this is in the "ledger" issue I see all this as part of the general concentrate / vendor data adding / modifying workflow. I still think it's a good idea to treat incoming changes from the "ledger" as changes by (a) "normal" user(s). People could subscribe to changes by these users and review them if they want to.

From the strategies you mentioned I would pick:

I wouldn't pick:

I'm unsure about:

ayan4m1 commented 5 years ago

So there are a few different points that I should elaborate on. I don't think there is any reason to maintain two copies of the same dataset or have a "mirror" of production data (backups are a different story). However, I think that we are planning on using continuous delivery, and having multiple environments is a crucial part of that.

I think auditing is going to be important, so we should decide soon if we are using pgaudit, using custom _history tables and triggers, or relying on application-level audit log manipulation.

Will elaborate more when I have the time, thanks.

DeadBranches commented 4 years ago
* Pay-to-play:  No.  Way to heavy and I don't see a good application here.  It should be perfectly fine to have users that don't publish recipes and just take care about concentrate / vendor data.

I see where you're coming from. Perhaps other users can validate the output of specialized users by interacting with the changes in some manner. If the collective reputation scores of those who validated the changes meets some threshold, then the data could be accepted as valid.