sqlitebrowser / dbhub.io

A "Cloud" for SQLite databases. Collaborative development for your data. 😊
https://dbhub.io
GNU Affero General Public License v3.0
371 stars 39 forks source link

Implement "teams" privileges option #134

Open cmichaud92 opened 4 years ago

cmichaud92 commented 4 years ago

I'm interested in using dbhub to host a small database for a fisheries project I am collaborating on. We are currently using Google drive and are forced to "version" sqlite databases as we update and share them. having access to a cloud deployed database would dramatically increase our efficiency. I am hesitant of using your platform if I must open my data to "public" to enable collaboration as I am dealing with some sensitive information. I read somewhere you are/were considering implementing a "teams" access option. This feature would be incredibly useful to me.

justinclift commented 4 years ago

@cmichaud92 That definitely makes sense. :smile:

It'll probably be a while before we start into the Teams part of things, as development is currently focusing on the new API and improving the integration with the GUI application. Haven't really thought through the Teams piece yet, or what it'll take to make happen (there are just placeholders and reminders through our code at the moment).

That being said, what's the workflow you're currently using? It's possible some of the existing access mechanisms might be workable.

For example, the dio command line utility can be used to upload and download databases using certificate based authentication. We've seen people using that to synchronise data between several computers. eg:

  1. Create a shared account (say) "fisheries_team1"
  2. Generate the certificate for that account (there's a button in the webUI)
  3. Copy that certificate to everyone that needs access
  4. Install and configure dio, using that certificate
  5. Use dio for uploading and downloading stuff

The GUI (DB Browser for SQLite) uses the same certificate for accessing things remotely, so the same applies there.

Does that help? :smile:

cmichaud92 commented 4 years ago

@justinclift This is certainly a workable solution. I have managed to get dio up and running and have pulled a database (123a.sqlite) to remote. I have also succeeded in opening it in DB Browser (by Open Database and selecting the 123a.sqlite commit sha in the '.dio\123a.sqlite\db' directory) modifying it and finally Write Changes. The problem - When I run 'dio status 123a.sqlite` I get the status is unchanged (even though I modified the database) . I am unable to commit or push. Somehow I'm not interfacing dio/DB browser correctly. :confused: Any thoughts are appreciated. I should note, I'm a novice programmer, so the obvious may not be so obvious... :smile:

Thanks

justinclift commented 4 years ago

Ahhh, interesting. I'll take a look at this tonight after work. It's probably related to the changes over the last few days in the comms between the GUI (latest dev builds) and the server.

Hmmm, on that note, we'd better update the unit tests to match the changes. And make sure the unit tests runs as part of our Pull Request checks so we don't break existing users. Will get that started tonight too. :smile:

justinclift commented 4 years ago

Btw, which OS are you running the GUI on? If it's Windows or macOS, it might be useful to try our latest nightly dev builds, as @MKleusberg has been adjusting the remote access pieces in that for easier operation.

It'd be useful to have feedback on the updates there too, if you're ok sharing them? :smile:

MKleusberg commented 4 years ago

@cmichaud92 Can you try this?

1) Create the shared dbhub.io account and generate the client certificate 2) Download the latest nightly build of DB4S and install that 3) Import the client certificate in the Remote section of the Preferences dialog 4) In the Remote dock (by default it's somewhere on the right) select the certificate you have just imported (there's a drop down box at the top called Identity) 5) In the DBHub.io tab of the Remote dock you should see your user name and database at the top of a long list. Double click the database you want to edit 6) The database should download and open automatically 7) Now just edit the database as usual and click the Write Changes button when you're done 8) To upload the new version click the Upload button (after saving but before closing the database), fill in the details, and click OK

Now a new version of the database should be on the server. Everybody with this client certificate can repeat these steps and download and modify that database.

chrisjlocke commented 4 years ago

As a 'working my head around the workflow', can you confirm if user Adam checks out version 2, then user Charlie checks out version 2, both make amendments, then both check back in, there are now three variations - the original, Adams, and also Charlies. At the moment, we don't do any diff's or anything, so 'collaboration' has to be a 'can't checkout until that other user has checked in' ... is this correct? If user Adam has checked out, is there any 'notification' to Charlie that there is a database that has been checked out by Adam? Even as a 'sloppy' workaround, some log?

MKleusberg commented 4 years ago

@chrisjlocke Yes, I believe that's correct. It's basically that whoever checks in first wins. The best you could do from DB4S at the moment is uploading to a new branch. I guess as long as Adam and Charlie don't work on a database at the same time you can circumvent this problem by making sure they always download the latest version right before making changes and upload their changes right after making them. If they work on the same database at the same time, well, we should definitely add diffing and merging soon :smile:

justinclift commented 4 years ago

@cmichaud92 Just now fixed a problem in Dio due to us changing a datestamp format used in the server communication a few days ago. I was thinking that might have been the problem you hit, but reading back over your problem description, it's not.

When you run dio pull <somedatabase.db>, it downloads the database and places it in two locations:

  1. The current directory
  2. As a kind of backup, under .dio/<databasename>/db/

The databases under that .dio/<databasename/db directory should probably be regarded as "read only". :wink:

The idea there is to use them as a local cache of files for updating the working copy (the current directory). eg if you want to wipe a version you've been mucking around with, and restore it to a known state.

Having files cached locally means not having to download them every time from the server, which for larger sized files is otherwise a real pita. :wink:

So, pretty sure the problem you hit was just an early stage learning thing, changing the "backup" version of the database instead of the one in the directory in the directory you ran dio pull 123a.sqlite from.

justinclift commented 4 years ago

@cmichaud92 With the version of dio you used, did you compile that from source? If so, it's probably best to recompile it with the change I added to the dio repository a few minutes ago. Otherwise dio list will likely be broken. :frowning:

~Reminder to self - tag a new dio release (0.3.1) tomorrow, with that fix included~

justinclift commented 4 years ago

@MKleusberg Interestingly, the unit tests for Dio didn't pick up the server side code change as being incompatible.

We mock the server side communication in the current unit tests, rather than spinning up a local DBHub.io server. We might be better off spinning one up in Docker and testing against that, as it would catch this kind of problem.

Will need to investigate that. Not a task for tonight, as I'm pretty tired atm. :frowning:

MKleusberg commented 4 years ago

@justinclift Yeah, I agree. With the mock server we can never be sure to catch this sort of issue.

cmichaud92 commented 4 years ago

@justinclift Yes, I see it did download a *.sqlite to my working directory... :roll_eyes:. I'll recompile dio and try again now that I have a little better understanding. I'd bet this solves my problem.

@MKleusberg , @chrisjlocke Thanks for your thoughts on this also. I will try your suggested workflow this evening. I think it may be exactly what I need for collaborators who are more comfortable with GUIs than command line tools. I am interested in how dbhub will handle concurrent updates by multiple users. While there is not a high likelihood in my use case, it certainly is possible. I could minimize the chances by requiring an immediate pull --> update --> push workflow.

I'll let you know how it goes this evening. Thanks!

justinclift commented 4 years ago

I am interested in how dbhub will handle concurrent updates by multiple users.

Not fantastically at the moment. AFAIK, the GUI and command line tool keep track of which commit was "the latest" when they download a database. They provide that info back to the server when uploading, and it'll tell them they're out of date (and disallow the upload) if they try to upload a new database but someone else has beaten them to it.

So, not a perfect solution, but it should stop people directly overwriting each others changes.

At a future point we'll have diffs and PR's ("Merge Requests" in our web UI terminology) implemented, so people can trigger the merging of changes directly. But we're not there yet.

justinclift commented 4 years ago

Oh, the branching model (same as git) does work. It's just the merging-branches-back-together piece that needs further development.

cmichaud92 commented 4 years ago

@MKleusberg @justinclift I have managed to pull modify and push data directly from DB Browser to dbhub. I feel this will make the process far more accessible to collaborators than the command line tool. That said, I do really like dio as it is easy to use and familiar for those of us with experience using git. Thanks for the detailed instructions. I feel I'm pretty well setup for the moment. I'll be collecting data in the field for the next week and plan to cycle back to this process soon after I return. I think DB Browser and dbhub have great potential for my use case, as major component of a detailed data management process supporting ecological data collection, storage and analysis. Thank you for taking the time to get me up to speed.

justinclift commented 4 years ago

You're welcome @cmichaud92.

We're very much open to ideas for improving things too, so if further ideas/suggestions come up, please let us know. :smile:

justinclift commented 4 years ago

Reminder to self - tag a new dio release (0.3.1) tomorrow, with that fix included

Done.

justinclift commented 4 years ago

@cmichaud92 How you go with this?

As a data point, if you're using a nightly build of DB4S then it's probably worth updating to tomorrow's nightly build.

A bug was introduced to our uploading code a few weeks ago, which meant that the first upload of a database works, but subsequent ones didn't (without a manual workaround). That's fixed in our source code now, and all our nightly builds will have the fix.

cmichaud92 commented 4 years ago

@justinclift I had not yet encountered that bug, thanks for the heads up! I'll update to the most recent nightly build tomorrow. My focus this week is getting collaborators up and running on dbhub. I'll let you know if we encounter anything unexpected :open_mouth:. Thanks for keeping me in the loop!

justinclift commented 4 years ago

No worries. :smile:

MKleusberg commented 4 years ago

I am interested in how dbhub will handle concurrent updates by multiple users.

Not fantastically at the moment.

Just a quick update that we're making progress with this. DBHub can now display the changes made in a commit, the changes between two branches, and the changes made in a MR. Merging is still going to take me some more time but seeing what other people changed might already help you a bit :smile: Because I'm working through the code anyway right now, so feel free to mention any missing information or configuration options and I'll try to add them, @cmichaud92 :smile:

MKleusberg commented 4 years ago

@cmichaud92 Some more progress: I have just added merging and sharing of databases. This allows an improved collaborative workflow like this (you'll need to update to the latest nightly build of DB4S for this though).

  1. Let people upload their changes as usual.
  2. If they get an error message, somebody else has been faster. In this case upload to a new branch
  3. On the web site create a new merge request for merging that branch into master
  4. On the merge request page you can click "View Changes" to see what was changed
  5. By clicking "Merge this Request" the changes are merged back into your master branch while preserving the changes from other commits in that branch. In case there is a conflict (the same row has been modified in both branches), merging is not possible until the conflict is resolved.
  6. A merge commit in the master branch appears which you can pull.

Sharing databases works like this:

  1. Upload a database to your account
  2. Make other people register on DBHub.io, generate their own client certificate, and load that into DB4S
  3. Go to the settings page of the database, add the users to the new "Share Database" section and click the "Save" button
  4. Copy the link which appears when clicking the "Clone Database in DB4S" button and send that to those users
  5. In DB4S they can click the "Clone from Link" button in the DBHub.io tab of the Remote dock and download your database using that link
  6. When uploading changes, they currently still need to make sure to manually type in your user name in the "Username" field in the Upload dialog