johannesboyne / gofakes3

A simple fake AWS S3 object storage (used for local test-runs against AWS S3 APIs)
MIT License
361 stars 84 forks source link

afero.Fs-based backends #21

Closed shabbyrobe closed 5 years ago

shabbyrobe commented 5 years ago

My colleague asked me if I could find a way to support a workflow where she points gofakes3 at an existing directory and makes it available over an S3-compatible interface. We're dealing with content repositories that are in the hundreds of gigabytes, which makes certain kinds of tests infeasible without using the real data, so the existing backends are impractical for her use cases.

This PR adds two backends, both filesystem-based: s3afero.MultiBucketBackend and s3afero.SingleBucketBackend. MultiBucketBackend allows you to point to an existing path and create multiple buckets under the /buckets subdirectory. Metadata is stored in the /metadata subdirectory by default, but any afero.Fs can be used. SingleBucketBackend is for the use case where you have a single existing filesystem that you just want to treat as a bucket. Metadata is stored in a separate path if passed, or in memory if not.

I opted for the afero.Fs filesystem abstraction layer instead of direct os calls because it broadens the support considerably for the same amount of effort, and is also useful for narrowing a filesystem's scope to a subdirectory (afero.NewBasePathFs(...)). Here's a brain melter for you: it is possible to implement an afero.Fs that abstracts S3 itself, which would mean that gofakes3 could use S3 as a backend... 🧠 :boom: !

I ended up removing the default value for -bucket as it does not apply to SingleBucketBackend; I couldn't find a way to retain the default value that wasn't a bit icky (which is not to say there isn't a way, just that I couldn't find it 😆 ). I've also replaced the name with -initialbucket as I think it better reflects the semantics, though it is true that it is a bit long winded. I'm not married to the name change, I'll change it straight back if you reckon it's too long. The existing option is retained and marked as deprecated.

It's not ideal, but I've punted on writing tests here with the intention of putting some work in to the gofakes3 test suite such that it can run all of the tests with all of the backends. I'll try and whip up a few basic tests now just to make sure it's not a complete tyre fire, but I think it would be a lot less work for a lot more benefit to concentrate effort on a tester for everything. Would you be comfortable with that or would you prefer I went for broader coverage of these backends up-front?

codecov-io commented 5 years ago

Codecov Report

Merging #21 into master will decrease coverage by 0.04%. The diff coverage is 62.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #21      +/-   ##
==========================================
- Coverage    76.9%   76.86%   -0.05%     
==========================================
  Files          14       14              
  Lines         996     1007      +11     
==========================================
+ Hits          766      774       +8     
- Misses        155      161       +6     
+ Partials       75       72       -3
Impacted Files Coverage Δ
messages.go 66.66% <0%> (-16.67%) :arrow_down:
option.go 71.42% <0%> (+4.76%) :arrow_up:
routing.go 83.75% <100%> (+1.08%) :arrow_up:
prefix.go 89.65% <100%> (+1.65%) :arrow_up:
gofakes3.go 69.96% <100%> (+0.02%) :arrow_up:
range.go 83.87% <0%> (+3.22%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 192b3b7...5bfb27c. Read the comment docs.

shabbyrobe commented 5 years ago

Marked as WIP until I can get enough tests in to keep codecov happy

johannesboyne commented 5 years ago

Here's a brain melter for you: it is possible to implement an afero.Fs that abstracts S3 itself, which would mean that gofakes3 could use S3 as a backend... 🧠 💥 !

:D like it!

Give me some hours as there are some parts I would like to understand better and where I just need to digest the code a bit more thorough.

shabbyrobe commented 5 years ago

Give me some hours as there are some parts I would like to understand better and where I just need to digest the code a bit more thorough.

Oh of course! No hurry at my end, I'm quite conscious of the fact I've been throwing 1,000+ line PRs at you 😆

If you come across anything that you think is poorly explained in the code, please let me know and I'll document it better.

shabbyrobe commented 5 years ago

I've brought this back up to date now, but the codecov tasks are failing. I was wondering if you'd mind if we just pretended that they were passing for this PR. I've added some tests for the core operations but I'd like to defer increasing the coverage until I get some time to rework the integration tester to run against all of the backends.

I figure it's a bit like the Language Server Protocol thing: I can write a ton of tests now for each of the backends at huge time and complexity cost, or I can get the structure right, write each test once, and have it work against every backend, reducing the number of required tests from (backends * tests) to (1 * tests).

shabbyrobe commented 5 years ago

Hi @johannesboyne, I was wondering if you might be able to spare a bit of time to take a look at this? No problem if not, just thought I'd give it a bump :)

johannesboyne commented 5 years ago

Hi @shabbyrobe sorry, hadn't had much time this week! I already looked at it from a style and complexity perspective and all of it looks fine ;) let's give it a GO I probably won't find time before mid next week to fully understand it! But all tests pass, all test cases look reasonable and the code looks pretty stable as well!