gvlproject / genomespace

Test suite & Issues for GenomeSpace Australia
0 stars 1 forks source link

Dropbox uploads broken - CORS issue. #45

Closed madisonkeene closed 7 years ago

madisonkeene commented 8 years ago

XMLHttpRequest cannot load https://genomespace-test-dev.s3.amazonaws.com/tmp/2016-09-09%2003%3A41%3A01%3A31_bitOps.java?uploads. Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://genomespace-dev.genome.edu.au' is therefore not allowed access. The response had HTTP status code 403.

madisonkeene commented 8 years ago

Same thing is happening for google drive.

madisonkeene commented 8 years ago

@ykowsar I think this is a AWS config issue (but I have no credentials to get in and check).

nuwang commented 8 years ago

@madiflannery @ykowsar Which account are you using for the new GenomeSpace? Andrew really wanted to consolidate our two accounts into one, and the plan was to migrate GenomeSpace to the 326**9 account. (We have another account that starts with 8 or 9 as I recall, which was used by GenomeSpace - and needs to be closed down). Madi, I've just sent you credentials for accessing the dashboard.

madisonkeene commented 8 years ago

OK no worries. Not sure which acct we're using but I'll make sure it all gets migrated over to the new one :)

ykowsar commented 8 years ago

The whole thing is broken. In response for uploading something to dropbox genomespace assume it is an s3 storage type and since there is no s3 associated with this name it returns the address to the home directory which I have made it readonly.

I cannot find why it assume it is an s3 account this but this is what we should fix. Here is a sample response for uploading to dropbox: "uploadType":"S3","path":"\/storage\/System\/s3\/genomespace-dev\/tmp\/2016-09-13 08:52:33:413_BucketExplorer.dmg","s3BucketName":"genomespace-dev" ....

And it is clear the uploadType is wrong

madisonkeene commented 8 years ago

I have a funny feeling this is a merge-gone-wrong thing - will investigate! Thanks @ykowsar :)

nuwang commented 8 years ago

@ykowsar @madiflannery I ran into this issue on the U.S genomespace server - I think it's by design. That is, although the uploadType is s3, if you upload to the given URL, it will go into the dropbox folder - so I'm guessing that GenomeSpace is now using S3 as a temporary staging area before uploading to dropbox.

madisonkeene commented 8 years ago

Hmmmm very interesting - we should investigate how much $$$ this will cost and what we need to do @ykowsar

madisonkeene commented 8 years ago

Yep, you're completely correct @nuwang

An upload to dropbox on gs.org: Screen Shot 2016-09-14 at 3.47.00 PM.png

madisonkeene commented 8 years ago

OK So here's the deal:

I'm voting the way forward is to change dropbox back to not go via s3, get s3 uploads working, and disable google drive for now. That way we can give the CDK to neil/wilson, they can get a sepsis demo working using swift/dropbox/s3, then worry about pushing out a gdrive upload info that doesn't use s3, and re-enabling gdrive after that. Thoughts @ykowsar @nuwang @AndrewIsaac ?

madisonkeene commented 8 years ago

p.s- this is the worst thing EVER

ykowsar commented 8 years ago

That is a stupid rediculous and retarded way to manage an upload I am very surprised 🤐

On 14 Sep 2016 6:21 PM, "Madison Flannery" notifications@github.com wrote:

p.s- this is the worst thing EVER

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-246940418, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oNc8MuUyyNjhnS5xDjv19cg1zbj5ks5qp66WgaJpZM4J4rAl .

nuwang commented 8 years ago

But won't that mean diverging from the U.S. codebase? I think it makes sense to have the code be consistent with whatever the U.S. GenomeSpace server is doing. There must have been a reason for using a temp staging area - maybe Marco or Ted will be able to provide specifics. From an SDK point of view, it does make the implementation simpler because you don't have to have specialised handling for each type of storage (e.g. dropbox and google drive) - everything is an s3 upload.

ykowsar commented 8 years ago

No the way they have handle this is too simolistic which is unwanted here. Using s3 means a huge cost and still you have to handle every upload on the server side which is the same as having a client side upload. And up to what I have seen now every objectstore has a post mechanism upload

On 15 Sep 2016 3:45 AM, "Nuwan Goonasekera" notifications@github.com wrote:

But won't that mean diverging from the U.S. codebase? I think it makes sense to have the code be consistent with whatever the U.S. GenomeSpace server is doing. There must have been a reason for using a temp staging area - maybe Marco or Ted will be able to provide specifics. From an SDK point of view, it does make the implementation simpler because you don't have to have specialised handling for each type of storage (e.g. dropbox and google drive) - everything is an s3 upload.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247096862, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oNOdU8HgOlsilyrkhBuwsn0y9SqPks5qqDLDgaJpZM4J4rAl .

madisonkeene commented 8 years ago

It will mean diverging, yes, but the alternative is that it costs us money every time a user uploads to any storage type... This is probably a decision that needs to be made by the higher ups imo cause its not 100% clear what is happening to GS Australia going forward - if its not going to exist in a few months time then it doesn't really matter what we do to the code base.

ykowsar commented 8 years ago

There are more issue than just the cost. They are basically bridging the SLA.

ykowsar commented 8 years ago

Why do we need to provide dropbox or google drive on the local server in the first place? They are so out of scope for our project to me.

On Thu, Sep 15, 2016 at 11:17 AM, Madison Flannery <notifications@github.com

wrote:

Yeah that too haha. Really need to get this out though...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247205622, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oI5jMAukyo8oIW-0VG39A4sf7u-Cks5qqJyxgaJpZM4J4rAl .

madisonkeene commented 8 years ago

Because that's what was promised for sepsis- not all users will have swift access

nuwang commented 8 years ago

If we end up using the u.s Genomespace (which we should imo), cost is a non issue because it'll be handled by broad. I believe the reason it has been done this way is so that you can add new types of storage without breaking existing clients. For example, the python- genomespaceclient didn't have support for Dropbox, but it works with Dropbox anyway.

If, on the other hand, they shifted the burden of handing storage types to the client, that would be a bad design decision because existing clients would keep breaking every time a new storage type is added - not exactly a good client-server design.

So i think what they've done is logical and as long as the cost is a non issue for them, it should eventually be a non issue for us post-merge.

madisonkeene commented 8 years ago

This looks like it's only for small uploads though - if you have a client you'd still have to manage the large uploads for each storage type afaik.

I agree we should move to the US GS, but the problem is that neil/wilson need a new and updated GS Australia (with swift) to point the new CDK at - and this really needs to happen in the next day or two. All good though, I'll get clarification on what we should do from @AndrewIsaac and then go from there :)

nuwang commented 8 years ago

The most elegant thing would have been to have a storage independent upload endpoint. Internally, Genomespace could have potentially done an async upload to the actual storage. Doing something like that can get pretty complicated though. Using s3 is the simpler solution because all existing clients support s3.

madisonkeene commented 8 years ago

Yeah that'd definitely be the most elegant solution! All good though it is what it is

nuwang commented 8 years ago

@madiflannery only small uploads? Then none of what I thought makes sense if you end up having to handle different storage types on the client anyway :-(

madisonkeene commented 8 years ago

I'm not 100% sure but I am still seeing multipart upload code randomly throughout the code base, but I havent really paid attention to what data type its for... will have to investigate!

ykowsar commented 8 years ago

That is exactly when you are breaking the SLA. Imagine I have a patient data and I am only able to upload it to a specific place which the patient agreed and I am trying to use GenomeSpace. Now genomespace is uploading my data to a storage without letting me know and bridge my agreement with patient or government or many other organizations which is a crappy mechanism :) I am not sure even my data remains in Australia or going overseas now.

On Thu, Sep 15, 2016 at 11:34 AM, Nuwan Goonasekera < notifications@github.com> wrote:

The most elegant thing would have been to have a storage independent upload endpoint. Internally, Genomespace could have potentially done an async upload to the actual storage. Doing something like that can get pretty complicated though. Using s3 is the simpler solution because all existing clients support s3.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247208416, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oNq9ZwL1c-8oIpN9GWAGoVE1CpVeks5qqKChgaJpZM4J4rAl .

madisonkeene commented 8 years ago

OK this whole thing is so weird. So when I upload to dropbox, it basically puts the file in a tmp folder in the default s3 bucket, and then its not actually in my dropbox, though i can see it as though it is in my GS

ykowsar commented 8 years ago

I hope not maybe we should have a chat with Marco on this. This can be a big issue.

On Thu, Sep 15, 2016 at 1:40 PM, Madison Flannery notifications@github.com wrote:

OK this whole thing is so weird. So when I upload to dropbox, it basically puts the file in a tmp folder in the default s3 bucket, and then its not actually in my dropbox, though i can see it as though it is in my GS

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247225889, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oK-MnuHFJWA4t21aaR5_2zVL6kgDks5qqL4SgaJpZM4J4rAl .

madisonkeene commented 8 years ago

Emailing Ted and Marco as we speak - the file I uploaded from their GS yesterday is in my dropbox so I feel like we're missing something. Will cc you.

ykowsar commented 8 years ago

I mean I hope everything is handled on client side. Yes the file would eventually get to your dropbox but it is important how it gets into your dropbox. Are they uploading it first to a temp folder on Amzon or not?

On Thu, Sep 15, 2016 at 1:44 PM, Yousef Kowsar ykowsar@gmail.com wrote:

I hope not maybe we should have a chat with Marco on this. This can be a big issue.

On Thu, Sep 15, 2016 at 1:40 PM, Madison Flannery < notifications@github.com> wrote:

OK this whole thing is so weird. So when I upload to dropbox, it basically puts the file in a tmp folder in the default s3 bucket, and then its not actually in my dropbox, though i can see it as though it is in my GS

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247225889, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oK-MnuHFJWA4t21aaR5_2zVL6kgDks5qqL4SgaJpZM4J4rAl .

madisonkeene commented 8 years ago

Yeah but it's not going to my dropbox at the moment at all - its staying inside the tmp folder. I can tell because all of my CDK test uploads from the last week are still in there....

madisonkeene commented 8 years ago

But yes can confirm it is most definitely going via an s3 temp folder - @AndrewIsaac has OK'd this for gs-dev so we can use it for sepsis demo's

ykowsar commented 8 years ago

If that's the case we should have a clear chat about this in a meeting. Specially with Andrew L. Many people didn't even try genomespace for these possible scenarios a couple of years ago (Back then it was not even like this at all). Privacy is a real concern here. And we should make sure we are consistent with the deliverables of sepsis.

On Thu, Sep 15, 2016 at 1:50 PM, Madison Flannery notifications@github.com wrote:

But yes can confirm it is most definitely going via an s3 temp folder - @AndrewIsaac https://github.com/AndrewIsaac has OK'd this for gs-dev so we can use it for sepsis demo's

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247226966, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oJGb63P8M4E-07HCSv0mUMDkIIX4ks5qqMBogaJpZM4J4rAl .

madisonkeene commented 8 years ago

Agreed - I've emailed broad now, cc'd you, feel free to elaborate on the privacy concerns!

madisonkeene commented 8 years ago

The CORS issue is now fixed.

nuwang commented 8 years ago

@ ykowsar interesting point about the data going overseas. can Genomespace provide that kind of SLA? Even if the actual data doesn't go overseas, the keys for accessing the data are stored in GS. One obvious difference is that keys can be revoked.

If storing keys remotely is ok, then the solution to that upload problem could be to proxy the network connection through genomespace without hitting disk/ going through tmp storage. This would still allow for a storage independent mechanism without breaking clients and without data being stored overseas.

ykowsar commented 8 years ago

Unfortunately it has not been decided yet on storing keys so genomespace is not storing the key yet. For dropbox it is using oauth2 thus it should not even try to touch the files. I am not sure even normal people who are aware of that be ok with that.

On 15 Sep 2016 2:39 PM, "Nuwan Goonasekera" notifications@github.com wrote:

@ ykowsar interesting point about the data going overseas. can Genomespace provide that kind of SLA? Even if the actual data doesn't go overseas, the keys for accessing the data are stored in GS. One obvious difference is that keys can be revoked.

If storing keys remotely is ok, then the solution to that upload problem could be to proxy the network connection through genomespace without hitting disk/ going through tmp storage. This would still allow for a storage independent mechanism without breaking clients and without data being stored overseas.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvlproject/genomespace/issues/45#issuecomment-247232967, or mute the thread https://github.com/notifications/unsubscribe-auth/AIF_oEC1Xkjv97vc46QtFYEN8yI87l9vks5qqMvngaJpZM4J4rAl .

madisonkeene commented 7 years ago

Going to disable gdrive & dropbox in the UI so we can get a new GS out the door, then decide on a fix later.

Disabled by commenting out the openMountGoogleDriveDialog(); and openMountDropboxDialog(); calls, replacing with a 'coming soon' alert(). Closing this issue.

madisonkeene commented 7 years ago

(Madi will also document this issue in the GS documentation)