sqlitebrowser / dbhub.io

A "Cloud" for SQLite databases. Collaborative development for your data. 😊
https://dbhub.io
GNU Affero General Public License v3.0
372 stars 39 forks source link

db4s: Internal server error when pushing to new branch #101

Closed MKleusberg closed 4 years ago

MKleusberg commented 7 years ago

I've just tried pushing to a new (not yet existing) branch name via the DB4S interface and got an Internal Server Error. Specifically I've tried pushing to mtest3/l.db (having made no modifications) and setting "test" as the branch name. I would have expected that this creates the "test" branch and then pushes to it.

justinclift commented 7 years ago

Oops. Sounds like a bug of some sort. Will investigate shortly. :smile:

justinclift commented 7 years ago

Thinking about this atm, we'll probably want to provide a commit id for the push too, like with the conflict detection you mentioned yesterday.

With a commit id given, and a new branch name given, the server will be able to create the new branch off of the given commit. Without a commit id being provided it'll just be guesswork (eg picking the HEAD commit from the default branch).

None of that probably affects the likely bug you're mentioning here, but since we're looking at it I'm kind of thinking we should probably get it done together. ?

MKleusberg commented 7 years ago

Sure, makes sense. Sending no commit id still needs to be supported for two reasons: 1) For an initial push we don't have a parent commit id 2) As described in issue #99 that would be a way to implement force pushes

But yeah, it definitely makes sense to use the commit id for branching, too 😄

justinclift commented 7 years ago

Yep, agreed. Investigating this bug now btw.

justinclift commented 7 years ago

Ahhh, for force pushes it'll probably still need a commit id. For example, lets say that the latest commit contains some dodgy or confidential data, mistakenly pushed. eg

To remove that, a force push (rewriting history) would be needed. But the server is still going to need to know from what point to rewrite history. From "commit 2" in the above example.

Guessing we'll need a "force" boolean field added as well, so DB4S can say "yep, we know this would conflict... ignore the error and overwrite the existing commits following the one we've given".

So, the push would supply the corrected database (without confidential info), + the commit id (c8d6f30f) of commit 2 above, along with the "force" boolean field set to "true". The result would be:

Sound sensible?

MKleusberg commented 7 years ago

Oh, totally forgot about that :wink: So yes, this sounds very sensible 😄

justinclift commented 7 years ago

Oh, as a curiosity thought, how hard/easy/etc do you reckon it would be to implement on-the-fly compression of the data stream from DB4S to the server?

At some point we'd probably want to investigate only sending deltas - eg when uploading changes to an existing database. But as @chrisjlocke mentioned the other day (#92), even simple compression seems to work well with many sqlite files.

justinclift commented 7 years ago

As a data point, LZ4 seems to be a super light weight compression approach: https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)

Faster than gzip for both compression/decompression. Haven't tried on a database yet though. If it sucks, then it's not a good choice. :wink:

justinclift commented 7 years ago

Found where the problem is with the DB4S end point not liking new branch names. The end point code isn't (yet) realising a non-existent branch name could be passed, so is just passing the given value through to the commit creation code which is barfing.

Shouldn't be hard to fix.

justinclift commented 7 years ago

... and it turns out to be fairly complex. My brain's not really up for it atm, so punting this to tomorrow. 😖

MKleusberg commented 7 years ago

No problem :smile: The DB4S code for this is already in place and committed. So it will "just work" whenever the server side is ready.

As for the compression: this shouldn't be too hard if we compress first and then send the file or receive the file first and then decompress it. It might be better though to try it on the HTTP level first as HTTP already supports compressing contents on the fly. I think DB4S already sends requests including the Accept-Encoding: gzip header (haven't checked though), so you should be able to send compressed databases using some sort of nginx configuration - I guess.

justinclift commented 7 years ago

Hmmm, we'll figure it out. It won't be an Nginx thing, as that's only used in one spot (listening on port 80 to bounce things to https). Nginx isn't in the connection path at all for either the webUI for DB4S end point.

With Go, there's all kinds of (de)compression stuff (of many different types) with it's built in libraries. Shouldn't be an issue to get something reasonable in place.

With the DB4S end point currently, it saves the whole uploaded file first before doing anything with it (eg checking if it's a SQLite database). So I'm thinking to add a decompression routine on the saved file before doing the database check. Just like adding one more stage in a pipeline. Should (!) be easy. :wink:

MKleusberg commented 7 years ago

Ah I see. It needs to be done in Go then. But when gzipping responses to DB4S requests on the HTTP layer it should still just magically work with current DB4S builds. This looks like an explanation on how to do that in Go but it obviously might work different for different libraries.

justinclift commented 7 years ago

Thanks @MKleusberg, will check it out. :smile:

With the push dialog, I'm thinking it'd be useful for DB4S to also send the "last modified" datestamp on the database file to the server. The on-disk format the server uses has room for it (currently just being set to now()). When pulling back info from the server, or downloading the file, the server can then include the info.

It's probably useful to set the timestamp of the downloaded databases to match that, as another way people can see what databases have been changed. eg when looking outside of DB4S.

At least, it was when I was doing stuff with the initial dio command line util. Planning on adding that "feature" to the new dio cli (when it gets done).

MKleusberg commented 7 years ago

Sure, that should be possible 😄 Just tell me whenever it's implemented on the server-side and I'll try to add it soon-ish.

justinclift commented 7 years ago

Cool. Might be a few days away. :smile:

justinclift commented 7 years ago

Back into this stuff now, after getting a few other annoying things taken care of.

Found something interesting. After getting the end point to accept branch names from DB4S for new uploads (eg branch="somebranch58"), DB4S's hard coded "master" entry now seems out of place:

For example, lets say a user uploads to a new database "foo.sqlite". Instead of going with the default branch name of "master", they go with a branch name of "nifty".

The DB4S end point receives the upload, creating the new database with the new "nifty" branch.

If the user goes to push again, the dialog still presents "master" to them as the default branch choice (with "nifty" in the drop down), even though "master" doesn't exist. Doesn't feel right. :wink:


On a related note, DB4S will probably want to know the default branch name, as set by the user in the webUI Settings page. Two potential ways to communicate that spring to mind:

Kind of thinking the 2nd one probably makes more sense. It'd be pretty easy to implement both though, if useful. :smile:

justinclift commented 7 years ago

Went the second option: 7d899fa09cd615bcd1f6c08679b2bec39043a81c

It does change the returned structure of the branch list data though. Hopefully not too hard to change the DB4S side.

$ curl -kE ~/my.cert.pem -D headers.out -G -d "username=test1" \
      -d "folder=/" -d "dbname=a.db" https://db4s-beta.dbhub.io:5550/branch/list
  {
    "default_branch": "master",
    "branches": {
      "branch1": {
        "commit": "c22d60fe2390a74006d3478c999855797f33c846ded801d8232c16ff9e224551",
        "commit_count": 2,
        "description": ""
      },
      "master": {
        "commit": "f76eb48a738c62420c34df8696978814d960ac08591e340b6e8f13dbf721deb0",
        "commit_count": 34,
        "description": "### Production branch\n\nOur latest running server data"
      },
      "somebranch": {
        "commit": "779498afa093d7d101e95b3bcb24dbc6d1966074056c27b45602ce65029e9fc6",
        "commit_count": 8,
        "description": "### Development branch\n\nTest stuff, pls ignore"
      }
    }
  }

Note, it's not running on the db4s-beta server yet. I'll push it there shortly though. :smile:

justinclift commented 7 years ago

It's now live on the db4s-beta server. So, the existing DB4S JSON branch list handling code is kind of broken with the new response format (just checked). 👼

MKleusberg commented 7 years ago

Should be fixed in the DB4S code now :smile: I'm also preselecting the default branch now. And it shouldn't show the "master" branch name anymore if there is no master branch.

justinclift commented 7 years ago

Yep, that's working now. Branch names are showing up properly, and master is no longer auto-added. Cool. :smile:

justinclift commented 7 years ago

While working through the various code paths (and potential errors encountered) server side for fork handling, collision handling (etc), I'm getting the strong feeling we'll need to figure out a good strategy for returning error condition info to DB4S.

My first thought is having http status codes indicate "there was an error", but returning specific info (useful to DB4S) in some kind of JSON structure. Maybe something like this (just a first guess though):

{
  "error_code": "COMMIT_ID_NOT_FOUND",
  "message": "The commit ID passed from DB4S to the server wasn't found in the history of the given branch.",
  "resolution": "Wrong branch name?"
}

With those fields, message is the kind of thing which could be shown to an end user. The above one probably wouldn't make sense though, as the commit id passed would likely be figured out by DB4S itself. Might be better to have DB4S use it's own messages for each error condition anyway, as that way they'd be translated. DBHub translation is probably looong way off (but would be good to have :wink:).

No idea if the resolution field would even be useful. Was more thinking it'd only be shown to DB4S developers, who'd probably receive it and have something to possibly check. :smile:

If some kind of JSON structure thing is the right approach, we'll need to figure out useful error codes. My preference there is something text based which us developers can look at and have a clue about just from the name itself, unlike with numeric codes (which need looking up).

Thoughts? :smile:

justinclift commented 7 years ago

@MKleusberg Just implemented the "last modified" date thing, so the DB4S end point accepts it for DB4S uploads, and returns it in the header fields for downloads. 7e5e7dc6993f69ca6ba61edb6ca43245a4fea42d

It's live on the db4s-beta server, so you have something to test against when you have time. :smile:

An example curl line showing the new "lastmodified" date field (date is ISO 8601 format):

$ curl -kE ~/my.cert.pem -D headers.out -F file=@someupload.sqlite \
    -F "public=true" -F "commitmsg=stuff" \
    -F "sourceurl=https://example.org" -F "licence=CC0" \
    -F "branch=master" \
    -F "lastmodified=2017-01-02T03:04:05Z" \
    https://db4s-beta.dbhub.io:5550/someuser

When downloading, the date is now included in the Content-Disposition header:

Content-Disposition: attachment; filename="someupload.sqlite"; modification-date="2017-01-02T03:04:05Z";
MKleusberg commented 7 years ago

Regarding the error checks: returning a JSON object with details should be fine. I'd probably just skip the resolution bit though because it's going to be hard to guess a resolution without context. If the error message is clear enough on the other hand, the user should be able to figure out the resolution.

justinclift commented 7 years ago

Good point. :smile:

MKleusberg commented 7 years ago

@justinclift Can you check what's wrong with the push request I just made (user mtest3, file test.sqlite)? I'm now including the lastmodified data but get a Bad request error.

MKleusberg commented 7 years ago

Ok nevermind, got it 😄

justinclift commented 7 years ago

Oops sorry. Was on another screen recording DB4S import bits. :smile:

MKleusberg commented 7 years ago

No worries, I figured it out anyway :smile:

MKleusberg commented 7 years ago

I'm sending the last modified date now to the server. Unfortunately Qt doesn't provide a way to set the last modified date of a file though, so I can't really use the date returned by the server. But I don't think it's super-important anyway. Are you able to double check from the server-side if the date is stored correctly?

If it works we can finally close this issue, I think :smile: The error message ideas could be moved to a new issue.

justinclift commented 7 years ago

Qt doesn't provide a way to set the last modified date of a file though

Wow. That's unexpected. For a "cross platform" toolkit to entirely not support basic fundamental cross-platform functionality is... bizarre... to say the least. Sounds like some kind of misguided design choice. 😦

And yeah, we can skip it for now. It'll just be implemented with our CLI only then.

MKleusberg commented 5 years ago

Almost forgot about this, but Qt seems to have added support for setting the last modified date in Qt 5.10. So for all builds using Qt >= 5.10 we now set the date properly when checking out a file :smile: The last modified date is also sent back to the server, so I guess this should be fixed now.

justinclift commented 5 years ago

Excellent! :man_dancing:

MKleusberg commented 4 years ago

Can we close this then? :smile:

justinclift commented 4 years ago

Yep. :smile: