A distributed model and offline access

andru commented 10 years ago

I was wondering what people's opinions are on a distributed model for OpenFarm. There are a number of ways a distributed access model could be achieved, some surprisingly simple to implement, and I think the benefits are significant.

One huge benefit is to remove the dependency on a single entry point to access the open data set, to ensure that the dataset can be accessed not just via an API but also via a distributed database that can be forked and updated without the need for an open connection to the OpenFarm server, and potentially later merged. This would allow the dataset to be accessed offline, behind a firewall, or even via another website without incurring API hits to the 'master' OpenFarm server and yet still ensure any changes made can be merged back in to the 'master' if desired. Essentially Git for a data-set :)

Offline access is a potentially huge win which could tie in with this. I'm one of the few in the 'developed' world without always-on internet, relying on satellite broadband and off-grid electricity generation to connect, but in developing economies, where OpenFarm could make a huge contribution, lack of always-on connections pretty much the norm. Also, there are plenty of use cases for this when one considers mobile access - out in the garden there's no guarantee there'll be a data network and I think that having mobile offline access to guides I need should be a bare-minimum of a useful OpenFarm web application.

PouchDB/CouchDB as a datastore

One of the easiest methods to implement a distributed database I know of is to use CouchDB and/or PouchDB as the datastore. CouchDB is a document store which is distributed by default and PouchDB brings the same API to the browser and to Node. By using CouchDB anybody can clone the database to access and modity it without a dependency on the OpenFarm API. Indeed, except for a few edge cases, there would be no need to build a REST API to access guide documents because CouchDB already is a REST API with document validation and update hooks. The win here is that a guy in a remote village in India can have access to the full OpenFarm database; That I can contribute to articles while I'm offline and sync the database when I have a connection; that a guide contributor can head out into the field with their device and take the database with them.

Conflict management

Management of revision conflicts is already a requirement for almost any open access database since even in a centralised database there's a certainty of this occurring, but in this case it's a lot more likely and would happen more regularly. There would need to be tools to show a differential between two or more revisions of a document and help resolve them. This could be handled either by the contributing user, and administrator, or an interested volunteer.

Offline first

I've already seen some discussion on here regarding making the OpenFarm application heavy on the clientside, and this fits the offline-first architecture perfectly. We could consider caching recently loaded guides and giving the user the option of saving guides for offline use. Combined with mobile access this makes OpenFarm a great field-tool. Combined with PouchDB to allow sync-ing a guide back to the server, this makes updating while offline a possibility.

Centralised distribution

Some of this might seem like losing control over a database, potentially allowing for bad contributions, abuse, etc. I think there remains a hugely important role for a central authority for a data-set like this: that OpenFarm is the master arbiter of the data-set in the same way that the FarmBot user is the arbiter of the OpenFarm git master. OpenFarm is the brand, the 'master' copy, and the community, but the data exists as a distributed model which can be forked, modified, and merged. As I said at the start: a git approach to data :)

I'd be eager to pitch in on this a bit later in the year, but I'd like to know if anyone else thinks this is a positive direction for the project, and see if there's any room for discussion here.

simonv3 commented 10 years ago

I'm a huge fan.

Considering the skillset of the people working on this project at the moment, I wonder whether it makes sense to work on this in parallel with what's already being worked on?

Some steps I see us taking right now to make this a reality (until you can put some more thought into this) is focusing on making things more active client-side.

andru commented 10 years ago

Awesome. I'm totally up for pitching in on this one as winter rolls on and the work here on the farm quietens down. I've already got a stack set up for an offline-first plant-centered app over at sembr/gnome so I should be able to raid some work from that for a head-start. I've seen discussions mentioning Angular on here which would make a lot of that code redundant (it's a Backbone/Ractive stack), but your suggestion of a parallel development path sounds like a possible option.

One big step in the short term could be switching over to CouchDB and looking into modelling some of the API requirements using that. I'm not familiar enough with rails to know what it's like switching over to a CouchDB backend from Mongo. Any ideas?

simonv3 commented 10 years ago

I'm not sure at all, @rickcarlino, probably knows what that entails.

RickCarlino commented 10 years ago

I like the idea of allowing for offline use, but I'm not sure if switching to an offline-first architecture is going to be a good move at this point. My big questions about switching over to CouchDB would be:

How would we do fine grained permissions for the admin backend features?
How will we handle the concept of admins?
How will we handle media storage? (I'm almost done getting S3 integrated with guides)
Is it a big enough win to justify a rebuild?

We already have an admin backend in place via Rails Admin, but it only supports SQL or Mongo as far as I know. Administration and admin rights will be a bigger issue when dealing with comments and reputation and things like that.

When I developed with backbone, I used some offline sync adapters like this one that could plug into any REST api with almost 0 configuration and didn't require a complete backend rebuild. It might be worth investigating if it's a more incremental solution. There are also ways of handling versioning within Rails/Mongo, although I've only ever needed to work with simple version invalidation type use cases.

If it's a big enough of a need then by all means let's go for it, but at this point I'm a bit hesitant to completely start over to accommodate this feature.

roryaronson commented 10 years ago

[Technical novice speaking here]

I'd like to discuss what other means there are to achieving the mentioned goals, as well as what similar projects have done to achieve them. And also weigh in if this is a big enough win to justify a rebuild as Rick mentioned.

Offline Access: It seems that very few people would ever want (or have the technical ability) to clone the entire database and set it up for access somewhere else (like on their laptop, smartphone, or a VPN). Rather, I imagine most folks will want to save a handful of guides for offline use in the garden or back at the farm far away from the Internet cafe in town. I see this being easily accomplished in two ways: As a feature in a dedicated mobile application, and with an 'Export to PDF' or other format option on every guide page - something that all of the old folks who still physically print things out are going to want anyways ;p

Offline Guide Creation: This could be done with a mobile app or an offline first web-app like what I gather Sembr to be. User creates and saves a Guide locally, then uploads it when online at a later point. No database forking and merging required, just an upload.

Multiple Database Access Points: Again, I can't imagine a lot of folks (with the exception of researchers) forking the whole database to do modifications via their own site and then wanting to merge the changes back in. This wouldn't even be feasible with a low-powered smartphone or a limited bandwidth connection. And it seems odd to try and merge the changes back into master. It's not so much of a merge, as it is an addition. Nobody should be changing other people's Guides and trying to merge the changes in - they should be creating new Guides and adding them to the main library. Offering a weekly database dump as a zip file should be good enough if one really wanted to access the data behind a firewall or if they reaaaallly lived up in the hills (the type of people who have also downloaded all of Wikipedia for offline access)

Thank you for bringing this discussion to the table Andru! At this point I don't see the use cases justifying a switch, but again I accept my technical ignorance. Are there things I am missing? What are other thoughts, use cases, or technical insights?

Oh, and another thing I wanted to bring up: What does Wikipedia do? Should we follow in their footsteps? It looks like they allow users to save articles for offline reading from their mobile application. They do not support offline editing and syncing. On web, they allow export to PDF and a printer friendly version. Plus, you can download the whole of Wikipedia as a zip file. From what I know there is no forking and merging of Wikipedia happening regularly.

Thanks for the discussion!!

simonv3 commented 10 years ago

@rickcarlino @roryaronson - just for my own clarification: I see forums and discussions, etc, happening at a much later stage, which to me can mean that it's a bit more canned than it currently is. In a long-term mindset, and going along with what @andru is talking about, I can see those being the "value add" of OpenFarm as an entity. Without those, we're just a database, rather than a community.

I agree that for now we should be concentrating on making the site work, and building that actual database (through user contributions, importing databases, etc).

However, I do wonder about making the site work offline (I'm someone who's regularly not at my laptop with internet connection myself). I don't think offline guide creation is necessary at the moment, but an easy step is "starring" guides. If a user is logged in, those guides are easily available on their dashboard, and also get stored in their offline web-app. If they're not logged in, they just get stored in their offline web-app. We don't have to build that now, but it's something that we already have, so why not leave the option open?

Though! This did just make me think - do we need a user to be logged in to create guides? Would a better mechanism be to prevent spam through captcha's and the like? This is actually worth an entirely new discussion I think (done).

andru commented 10 years ago

Nobody should be changing other people's Guides and trying to merge the changes in - they should be creating new Guides and adding them to the main library.

I think this represents a misunderstanding on my part then and actually leads me into a discussion on the model of the role of guides in general. A lot of how I'd like to respond to your points @roryaronson are actually centered on some discussion on that issue. Maybe someone could link me to a better issue or whatnot where this can be discussed? ( Update: I went into some ideas in #123 )

In the meantime I'll go into it briefly here: Is a guide a stand-alone document? Should one user not be able to modify the guide of another? I understand and support the need for multiple guides for the same crop to cover different growing environments etc and following the crowd-sourced "may the best guide win" kind of approach is a great pattern for that but if a guide isn't considered a wiki then I think this leads to the potential of fragmenting useful information over many incomplete or partially complete guides.

Eg. I just want to add the information about a Tomato pest: should I have to fill out an entire guide to add this? I'm certain to leave out a lot of useful information if so. Should I just choose the best Tomato guide I can find and add it to that? How do I choose which is most relevant? Do I copy-and-paste it into more than one? What if I'm excluding others which may contain very important and relevant information? The guide model is great and solves some big problems which many plant databases suffer from (the "more than one way to skin a cat" problem), but it seems like at that point it's creating real fragmentation and confusion: I have useful information to contribute, but nowhere good to put it.

If guides are mostly considered a read-only author-owned document then you're right: a distributed model actually makes no sense. One of the powers of distribution lies in the element of collaboration.

Oh, and another thing I wanted to bring up: What does Wikipedia do? Should we follow in their footsteps? It looks like they allow users to save articles for offline reading from their mobile application. They do not support offline editing and syncing.

I work with MediaWiki software on a regular basis for my day job. I don't think one should answer any question by asking what they do! They have the huge burden of an enormous wealth of content written for an ancient platform. If they had the chance to start over, I honestly doubt they'd choose the same model!

I like the idea of allowing for offline use, but I'm not sure if switching to an offline-first architecture is going to be a good move at this point. My big questions about switching over to CouchDB would be: How would we do fine grained permissions for the admin backend features?

I'm not sure I get the question so let me know if this doesn't answer sufficiently, but an offline-first application has no limitation on supporting online-only features side-by-side. A traditional administration console is unaffected.

How will we handle the concept of admins?

Again I'm not 100% sure if I'm missing an aspect of the question, but a distributed offline-first application doesn't negate an implementation of administrators. The same way GitHub has a concept of users, repo owners, etc, so any offline-first application may have an online aspect which handles authentication and authorization.

How will we handle media storage? (I'm almost done getting S3 integrated with guides)

CouchDB is actually a web-server and can handle media attachments to documents. This encapsulates documents and their media and is pretty awesome. On the other hand it can quickly get unweildy with a lot of large format media so it isn't always the recommended solution. One could only attach low-resolution previews, or one could just reference media by URL. In terms of offline access one could cache media for certain articles.

Is it a big enough win to justify a rebuild?

My personal perspective is that distributed offline-first systems are the future of an open web.

roryaronson commented 10 years ago

Ah ok, and now recognizing what you understood the guides to be, I can see the value to be had by a distributed model if the guides were more wiki-style, as it would function more like Git with the merging. So it seems this conversation depends on what the model is: Guides as single-author documents vs something more wiki style. The saga continues in #123!

RickCarlino commented 10 years ago

Hey guys two things:

Today's PR has added CORS, Token Auth and the ability to access quite a few database resources via the api/ endpoint. This makes a third party offline version more feasible. See the API docs for more information.
Maybe it's time we move this conversation to loom so that we can come to an actionable agreement? This issue has been sitting open for almost a month now.

What do you guys think?.

m3talsmith commented 10 years ago

As far as offline access goes, why not use localStorage as an offline container? It's supported by ie 8+, all other major browsers, and you can sync when online easily enough. I don't see the need for a new database just to make this a feature.

On Sat, Sep 27, 2014 at 8:28 AM, Rick Carlino notifications@github.com wrote:

Hey guys two things:

Today's PR has added CORS, Token Auth and the ability to access quite a few database resources via the api/ endpoint. This makes a third party offline version more feasible. See the API docs https://github.com/FarmBot/OpenFarm/blob/master/api_docs.md for more information.

Maybe it's time we move this conversation to loom so that we can come to an actionable agreement? This issue has been sitting open for almost a month now.

What do you guys think?.

— Reply to this email directly or view it on GitHub https://github.com/FarmBot/OpenFarm/issues/120#issuecomment-57056037.

Michael Christenson II Senior Ruby Developer p. (231)884-3024 e. michael@rebelhold.com e. m3talsmith@gmail.com g. http://github.com/m3talsmith

m3talsmith commented 10 years ago

For reference: https://developer.mozilla.org/en-US/docs/Web/Guide/API/DOM/Storage

On Sat, Sep 27, 2014 at 8:49 AM, Michael Christenson II < m3talsmith@gmail.com> wrote:

As far as offline access goes, why not use localStorage as an offline container? It's supported by ie 8+, all other major browsers, and you can sync when online easily enough. I don't see the need for a new database just to make this a feature.

On Sat, Sep 27, 2014 at 8:28 AM, Rick Carlino notifications@github.com wrote:

Hey guys two things:

Today's PR has added CORS, Token Auth and the ability to access quite a few database resources via the api/ endpoint. This makes a third party offline version more feasible. See the API docs https://github.com/FarmBot/OpenFarm/blob/master/api_docs.md for more information.

Maybe it's time we move this conversation to loom so that we can come to an actionable agreement? This issue has been sitting open for almost a month now.

What do you guys think?.

— Reply to this email directly or view it on GitHub https://github.com/FarmBot/OpenFarm/issues/120#issuecomment-57056037.

Michael Christenson II Senior Ruby Developer p. (231)884-3024 e. michael@rebelhold.com e. m3talsmith@gmail.com g. http://github.com/m3talsmith

Michael Christenson II Senior Ruby Developer p. (231)884-3024 e. michael@rebelhold.com e. m3talsmith@gmail.com g. http://github.com/m3talsmith

simonv3 commented 10 years ago

@m3talsmith, Yeah, I think localStorage is probably the best way forward while standards are catching up with more complete local databases.

What I think @andru meant with new databases is replicated databases that can be used as nodes to assure decentralization of OpenFarm. I still think that's a good idea, and it's part of a conversation we're having on Loomio.

On that note, I'm moving this potential conversation into Loomio as well.

RickCarlino commented 10 years ago

UPDATE: We have API docs now for those who want to develop third party clients. https://github.com/FarmBot/OpenFarm/wiki/The-OpenFarm-API-(ALPHA-VERSION)

openfarmcc / OpenFarm