smallstep / nosql

NoSQL is an abstraction layer for data persistency
Apache License 2.0
20 stars 23 forks source link

Upgrade badger dependency #12

Closed francislavoie closed 3 years ago

francislavoie commented 3 years ago

On https://github.com/caddyserver/caddy, we've had trouble in the past with having to explicitly set CGO_ENABLED=0 to build Caddy, since we've had DataDog/zstd as an indirect dependency through smallstep/nosql -> dgraph-io/badger/v2 -> DataDog/zstd, because that zstd implementation is not pure-Go.

Just today, badger has merged a PR to replace DataDog/zstd with a pure-Go zstd in klauspost/compress, so this should no longer be an issue. See the discussion on https://github.com/dgraph-io/badger/pull/1383, followed by the full replacement (merged today) in https://github.com/dgraph-io/badger/pull/1709.

So it would be great if smallstep/nosql could update its dependency on badger to the latest, so that projects like Caddy that depend on it can get rid of this last non-pure-Go dependency.

dopey commented 3 years ago

Hey @francislavoie in order to support this I think we'd need badgerv3 support in nosql. I just tried doing a simple upgrade and it seems that the API has changed in some breaking ways (which is to be expected). If someone who understands the new Badger options wanted to take a look to porting our codebase to work with the new API we'd be very grateful. Otherwise it will be some time until we have the cycles to explore ourselves.

Specific questions I have:

francislavoie commented 3 years ago

I reached out in https://discuss.dgraph.io/t/use-pure-go-zstd-implementation/8670/19 in case they're able to help.

dopey commented 3 years ago

Awesome! That would be great.

maraino commented 3 years ago

Implementing this would help on 3rd party projects https://github.com/smallstep/certificates/issues/358. They can select which nosql packages want to compile into the binary, or even implement their own, although this is already supported.

NamanJain8 commented 3 years ago

Hey @dopey, badger v3 is not compatible with v2 as mentioned in the release notes . There are changes in the on-disk format. Answering your queries:

dopey commented 3 years ago

Hey @NamanJain8 thanks for the quick reply and apologies for my radio silence.

  • ValueLogLoadingMode is no longer available. From v3 badger opens value log in memory-mapped mode.

I'm not sure if this will be an issue for us. We introduced a loadingmode attribute in ca.json to allow users to select FileIO (https://github.com/smallstep/nosql/blob/master/badger/v2/badger.go#L33-L42) because we found that RAM constrained environments were unable to run in memory-mapped mode. Is there a way to select a FileIO mode in badger v3?

That's the only attribute we explicitly map from our step-ca into the badger config, so once that's sorted we should be able to proceed.

With regards to a migration script - it's a nice to have, but not a blocker.

NamanJain8 commented 3 years ago

Hey @dopey.

Is there a way to select a FileIO mode in badger v3?

No, we can't use FileIO mode in badger v3. The value log loading mode was removed from badger in https://github.com/dgraph-io/badger/pull/1555. @jarifibrahim do you know why it was removed? I think it was because it was affecting the changed flow of how badger handles writes.

maraino commented 3 years ago

@dopey @NamanJain8 instead of supporting multiple versions of badger at the same time with all the problems that it has, we might want to only support one and create a program that migrates v1, v2 to the supported version, v3 for example.

For example starting with 0.16 we can say that we only support v3, and perhaps in 0.19 we can use a v4.

dopey commented 3 years ago

No, we can't use FileIO mode in badger v3. The value log loading mode was removed from badger in dgraph-io/badger#1555. @jarifibrahim do you know why it was removed? I think it was because it was affecting the changed flow of how badger handles writes.

@NamanJain8 Do you know what this would mean for users who currently require this mode due to memory constrained environments? If badgerv3 won't support those environments, then @maraino we will need to keep badgerv2 around to support those users.

manishrjain commented 3 years ago

I think we can just support an option in Badger to disable value log completely. Dgraph largely doesn't use value log anymore anyway, because the disk requirements with value log are unpredictable. That'd then allow people with resource constrained environments to run Badger.

Feel free to open an issue in Badger for this.

francislavoie commented 3 years ago

Not to be a bother, but would it be possible to move this along?

Some users of Caddy are seeing issues build Caddy with specific plugins because of strange issues like this:

../../../../../../../pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20201003150343-5d1bab4fc658/table/table.go:860:17: not enough arguments in call to z.Calloc
        have (int)
        want (int, string)

The PR https://github.com/smallstep/nosql/pull/13 to upgrade to a more recent v2 would at minimum help move this in the right direction, fixing some of the issues we've seen.

Upgrading to badger v3 would be the ultimate fix, in that it would finally remove the CGO dependency.

dopey commented 3 years ago

I'm testing the V2 PR as we speak.

With regards to upgrading to V3, without the valuelogloading mode change in Badger this would be a backwards incompatible change for step-ca users. We'd prefer not to make such changes unless absolutely necessary.

francislavoie commented 3 years ago

Fair enough.

In that case, @manishrjain is there any chance that the zstd changes could be backported to v2 of badger? I don't envision that would be too difficult?

dopey commented 3 years ago

Making a new release of smallstep/certificates now.

edit: Unless you'd prefer I wait on the zstd changes you mentioned. lmk

francislavoie commented 3 years ago

Considering there's no PR for backporting zstd yet, I don't think we need to wait, should be okay to bump again if it does happen quickly, but if not this should at least move things along a bit.

dopey commented 3 years ago

v0.16.1 smallstep/certificates is building. As a dependency, you should already be able to reference it. Most of the release packages and binaries won't be available for another 15 minutes.

francislavoie commented 3 years ago

Thanks @dopey! We'll update Caddy shortly.

I wrote a quick backport PR https://github.com/dgraph-io/badger/pull/1736 since it looked pretty easy (combining the changes from two other PRs, the code hadn't diverged significantly between v2 and v3 when it came to zstd)

francislavoie commented 3 years ago

@dopey https://github.com/dgraph-io/badger/releases/tag/v2.2007.4 has been tagged, which backports the zstd change and a fix for building on Plan9. Could you bump the dependency in nosql/certificates again? Thank you!

dopey commented 3 years ago

@francislavoie Just tagged v0.16.3.

Edit 1: Actually, hold off. Forgot to bump the version of nosql when I did that :<. Gimme another minute.

Edit 2: Ok, v0.16.4 pushed.

francislavoie commented 3 years ago

Awesome!!

I'll update Caddy now!

francislavoie commented 3 years ago

For the purpose of Caddy, this is satisfied at this point! I'll close this I suppose.

Thanks for the help everyone!