mountetna / magma

Data server with friendly data loaders
GNU General Public License v2.0
5 stars 2 forks source link

Issue51 files direct to metis #137

Closed coleshaw closed 4 years ago

coleshaw commented 4 years ago

@graft , here is what I have so far -- do you think I'm going in the right direction? From a high-level perspective, in addition to updating your 40-use-loader-for-update branch against master, I've added a method to MetisStorage class to generate an HMAC-signed "copy" URL and also inserted a new method in the workflow for update#action to do a POST against the copy URL on Metis.

I also have a couple of additional, specific questions:

Also random question: When testing via Timur, where would I see the file links in the Metis database? In files table?

Thanks!

graft commented 4 years ago

Looks along the way, I tried making some comments. I'm having a bit of trouble following because of all the rebasing going on. It seems kind of problematic that all of this stuff is happening on top of the update/loader changes, I'd really prefer to merge those - but those changes are pretty much impossible to merge without the subsequent Metis changes, so I guess rebasing is where we are at until this is ready to go. (on that note - can we work in a branch on this repo rather than in a fork?)

One general thing I am noticing: You are inserting a lot of case statements about the provider being Metis, because I wrote the storage class to appear as if we can switch seamlessly between AWS and Metis as our file storage. However, this is not really true - in the current storage we are sending files straight to magma, using Metis storage we are sending files straight to Metis. This is fairly different in concept and can't really be shifted as a config flag. We should probably just acknowledge in the code that Metis is Magma's storage provider and get rid of all of this case logic. In the future if we want to enable a direct AWS backend for Magma we can build that back in, but really we are breaking the current storage system with this update/loader stuff.

coleshaw commented 4 years ago

Thanks for the feedback, I'll give it another pass.

I agree it's confusing looking at this on top of the loader work ... it would definitely be cleaner if we could see just these changes. What are your thoughts about re-doing the PR but just against the loader branch, and we rebase against master as the final step? I can then also move the work to the mountetna org instead of a fork.

Regarding 100% commitment to Metis instead of AWS, that brought up a couple of questions:

1) Is there any way to tie that configuration change into the build and deploy process? It seems like an easy thing to forget, but it's also a breaking change in how Magma works, so a way to tie in the change to config.yml seems like it would be good.

2) Are there any migrations required for existing data, to get them into the new "Metis" format vs. old AWS format?

graft commented 4 years ago

I think you are right about not rebasing, I was trying to think about keeping up-to-date with that and it seems nightmarish. We can merge to 40-use-loader-for-update, I think that works, but we should probably do some review on those changes first, because they haven't really had any.

Regarding deploying this stuff, that is a good question. It involves a major movement of data between S3 and Metis, so there's little chance we can do it in one seamless deploy with no downtime. What I was imagining: 1) Copy the data from S3 into Metis. 2) Copy the values of file attributes from Magma - use the names here and the ones from S3 in (1) to determine which file on Metis should be linked to which record/file attribute. 3) Introduce the new changes. Zero out the existing file_attribute column on Magma. 4) Use the map in (2) to update magma with the file locations on Metis.

I think you're also right about the brittleness of using a fixed path for the copy route also, as that might change on metis. I realized however that Etna::Client has a private #route_path method that lets you construct the path using up-to-date routes from Metis. Still a little brittle in that the param names it uses might change, but better than just a fixed string template.

Regarding the bucket name 'magma', this is an interesting question we should guarantee somehow, I wrote it up here.

As for how to see stuff on Metis when making updates via Timur, yes, you should see a file appear in the appropriate Magma bucket - you can look in the Metis::File model for it in the console.

coleshaw commented 4 years ago

So I'm going to close this PR and move to using the 40-use-loader-for-update branch as the basis for this work. Will also file a separate PR on that branch to review those changes (though I think that will look messy due to the conflicts with the current work on migration API).