vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.45k stars 1.37k forks source link

Support non-object store backends (e.g. NFS) #1229

Open ipochi opened 5 years ago

ipochi commented 5 years ago

Hi,

I would like to know if the requirement for having an S3 compatible storage bucket for backup metadata is an inflexible hard constraint.

If it's an inflexible hard constraint, I'd very much like to understand the reasons behind and see If I can see from that same angle and twist my use case to fit the same.

If it's not a hard constraint I'd like to know How to accomplish the removal of dependency on object storage for backup metadata. I'd like to understand the code and possibly be able to push in a PR for the same.

In my ideal use case I'd like to be able to take a backup using Restic without having any dependency on the object storage or any cloud storage provider for the VolumeSnapshot and Backup metadata.

I understand Ark + Restic right now supports only File level backup on s3 compatible object storage There is an issue #1178 that talks about providing full array of support for other Restic supported backends. Any plans on implementing of the said issue ?

skriss commented 5 years ago

@ipochi what kind of storage backend are you interested in using?

ipochi commented 5 years ago

@skriss

NFS for example for both VolumeSnapshot and Backup metadata.

Ideally I don't want to setup cloud provider object storage or minio if on-premise for any backup related thing.

Is this possible in Velero ?

nrb commented 5 years ago

Currently, Velero is designed to send the backed up Kubernetes objects to object storage, and it is fairly hardcoded right now. That said, I don't think we'd be opposed to expanding it, but we'd likely have to revisit our plugin interfaces to accommodate that.

Another solution I can think of is to implement an ObjectStore plugin that saves to NFS instead of an HTTP endpoint. We currently have and example filesystem plugin that does something similar, though it's mostly an example, and not production grade. It can be found at https://github.com/heptio/velero-plugin-example/blob/master/velero-examples/file.go

ipochi commented 5 years ago

@nrb Thanks for replying.

I didn't quite understand the 2nd paragraph , could you please elaborate.

Is my understand correct ?

  1. VolumeSnapshot location can be different other location apart from object storage [ example other plugins such as openebs-ark-plugin]
  2. but backup of metadata needs an object storage ?

Your possible solution in second paragraph talks about point 1 or 2 ?

ipochi commented 5 years ago

@nrb @skriss any info on this ?

skriss commented 5 years ago

Velero expects to store both metadata backups and restic backups in "object storage" (more on this below).

As you mentioned, there's an open issue to allow additional restic backends. We're not actively working on this.

Velero has a plugin architecture to allow additional "Object Storage" backends to be implemented. @nrb's second paragraph was referring to the fact that you could probably create an implementation of this interface that didn't actually use an object-storage system behind the scenes. The interface you'd need to implement is:

// ObjectStore exposes basic object-storage operations required
// by Velero.
type ObjectStore interface {
    // Init prepares the ObjectStore for usage using the provided map of
    // configuration key-value pairs. It returns an error if the ObjectStore
    // cannot be initialized from the provided config.
    Init(config map[string]string) error

    // PutObject creates a new object using the data in body within the specified
    // object storage bucket with the given key.
    PutObject(bucket, key string, body io.Reader) error

    // GetObject retrieves the object with the given key from the specified
    // bucket in object storage.
    GetObject(bucket, key string) (io.ReadCloser, error)

    // ListCommonPrefixes gets a list of all object key prefixes that start with
    // the specified prefix and stop at the next instance of the provided delimiter.
    //
    // For example, if the bucket contains the following keys:
    //      a-prefix/foo-1/bar
    //      a-prefix/foo-1/baz
    //      a-prefix/foo-2/baz
    //      some-other-prefix/foo-3/bar
    // and the provided prefix arg is "a-prefix/", and the delimiter is "/",
    // this will return the slice {"a-prefix/foo-1/", "a-prefix/foo-2/"}.
    ListCommonPrefixes(bucket, prefix, delimiter string) ([]string, error)

    // ListObjects gets a list of all keys in the specified bucket
    // that have the given prefix.
    ListObjects(bucket, prefix string) ([]string, error)

    // DeleteObject removes the object with the specified key from the given
    // bucket.
    DeleteObject(bucket, key string) error

    // CreateSignedURL creates a pre-signed URL for the given bucket and key that expires after ttl.
    CreateSignedURL(bucket, key string, ttl time.Duration) (string, error)
}

Most of these functions are relatively easy to create an implementation of using a file system - @nrb provided a link to an example plugin that does just that. The CreateSignedURL function would be the trickiest one, I think. This is used for viewing backup/restore logs from a client, and downloading backup tarballs to a client. You might be able to get by without implementing this, depending on your requirements, or you could stand up some kind of web server front-end.

JimBugwadia commented 5 years ago

@nrb @skriss @ipochi Thanks! Having NFS support is very interesting!

Can the CreateSignedURL function return a 'file://' URL that points to a file system path?

skriss commented 5 years ago

@JimBugwadia i don't think it would work out of the box since I believe support for the file scheme is disabled by default in the go http package. It would also need to be reachable by any client that ran velero logs. Possibly an option to explore, though

SDBrett commented 4 years ago

+1

The use of NFS storage is widespread and adding support would be very helpful to many companies. Companies who are just starting to adopt K8S on-prem may not have an object store available, or budget to implement one.

carlisia commented 4 years ago

Here's an example of backing up to NFS that might be helpful. Might not be production quality:

http://www.rafaelbrito.com/2019/11/project-velero-12-on-openshift-311.html

SDBrett commented 4 years ago

Here's an example of backing up to NFS that might be helpful. Might not be production quality:

http://www.rafaelbrito.com/2019/11/project-velero-12-on-openshift-311.html

Thanks for the link. The lack of production quality is one of the issues with a solution such as this.

The customers that I work with (mainly service providers) will also raise an objection to yet another item to manage.

nrb commented 4 years ago

@SDBrett I think that's a fairly point as Kubernetes becomes more prevalent in on-prem environments. Velero was originally designed for cloud environments, so it went with a more "cloudy" set up with the object storage.

I think there's room for the introduction of a filePath storage solution, but we need time to think through a design. My initial thoughts are that it would be a new type of storage plugin that might require mounting a volume into the Velero container, and it uploads the K8s resources there. But it will require design and probably some prototyping.

skriss commented 4 years ago

One idea for providing backup/restore logs to the user: as needed, we could run a pod that fetches the log file from the NFS PV, streams it to stdout, and completes. The user would then effectively be running a kubectl logs to get the backup/restore log. This avoids the need for a separate API/server to serve the logs to the user.

SDBrett commented 4 years ago

Before starting, I understand that the suggestion below is a major change and probably feature breaking.

I've been thinking about this over the weekend and I keep coming back to how Terraform decoupled modules from the core project (I think in v.12). Perhaps a similar approach would enable new options for plugin development and improve flexibility.

The idea is to decouple core backup logic from storage plugins entirely.

Velero Core Core backup logic, interaction with the API server etc

Storage Plugin Plugin for communication with a storage provider. At a high level this is like a storage driver.

Plugins will provided by the 'vendor' as a binary which are mounted into the Velero pod to be used by Velero as needed. Plugins would need to adhere to a set of standards for mandatory inputs, but additional inputs could be specified with backup objects.

skriss commented 4 years ago

@SDBrett Velero actually already has a plugin architecture for both storage and custom backup/restore logic - see https://velero.io/docs/v1.2.0/overview-plugins/ and https://velero.io/docs/v1.2.0/custom-plugins/ for some basic info :)

For storage, we currently have ObjectStore and VolumeSnapshotter plugin types. As you can tell from the name, though, the ObjectStore plugin is coupled to the idea of using object storage as a backend, so it's not easily extensible to a generic file system, database, or anything else.

To support something like NFS as a backend, we'd need to either redefine the ObjectStore plugin type to be more generic, e.g. BackupStore, and remove the object store-specific assumptions baked into the interface definition, or add an alternate plugin type, e.g. FileSystemStore or something along those lines, that could be used instead of an ObjectStore as backend storage for backups.

Definitely interested in continuing to think through what this could look like with your input!

nrb commented 4 years ago

add an alternate plugin type, e.g. FileSystemStore or something along those lines, that could be used instead of an ObjectStore as backend storage for backups.

This is what I was thinking of with my filePath suggestion - a new plugin type that was distinct from our ObjectStore plugins, not necessarily having a hierarchy. But I think a discussion around the design is worth having, because I'm sure there's trade offs to each approach that I've not thought through.

lainosantos commented 3 years ago

Hi,

any solution for persistence in filesystem?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xeor commented 3 years ago

bump!

This is still very much needed! Testing of velero functionality, budget-companies, those just starting, home-labs, big companies already doing nfs but not s3. It would also be much nicer to not having to setup minio when re-initializing the test-lab.

HeroCC commented 2 years ago

Is there a way to make Velero use the HTTP server backend of restic as a first-class provider? It can be served by something like rclone, which will bridge between the S3 / Restic API and any of the numerous backends rclone supports (though admittedly, that seems to include everything except NFS).

vrabbi commented 2 years ago

Any chance on getting nfs support into velero? Im very interested in using velero however object storage is a blocker and therefore cant use velero and instead need to use other k8s backup solutions that do support nfs as a backend. For on prem environments having object storage is not yet common enough and having an nfs backend would allow velero to be used in nearly every environment which is not the case today. I like the approach that other backup solutions have gone down of having the user create an nfs backed pv and having thr backups streamed into that pv. This allows for a simple architecture and a great UX.

rajivml commented 2 years ago

we deal with a lot of on-prem customers at my current org and many of those customers don't have an object store deployed on their premises and due to the non-availability of NFS server support, we are not able to use Velero

reasonerjt commented 2 years ago

When we finish the kopia integration there will be a repository service layer provide a consistent API for different backend storage.

But there are additional work after that like the backup persistent storage in the code need to be able to write content to the repository. I'll move this issue to backlog.

abbbi commented 1 year ago

When we finish the kopia integration there will be a repository service layer provide a consistent API for different backend storage.

A storage backend api would be very interesting for integrating velero into existing third party backup applications. On Prem customers often have an existing backup solution, which usually also still streams data to real tape devices.

sseago commented 1 year ago

Kopia doesn't really affect this, since kopia just uses the same object storage interface as we're using for restic. The way to do this would be to write an object store plugin (consistent with the ObjectStore plugin API) that works on a filesystem back end.

abbbi commented 1 year ago

Another solution I can think of is to implement an ObjectStore plugin that saves to NFS instead of an HTTP endpoint. We > currently have and example filesystem plugin that does something similar, though it's mostly an example, and not production grade. It can be found at https://github.com/heptio/velero-plugin-example/blob/master/velero-examples/file.go

just for the record, the mentioned example plugin is now:

https://github.com/vmware-tanzu/velero-plugin-example/blob/main/internal/plugin/objectstoreplugin.go

reasonerjt commented 11 months ago

We need to find a way to make sure the downloadrequest is handled properly when there's no object store, which is tracked via #6167