scitran / core

RESTful API
https://scitran.github.io
MIT License
18 stars 18 forks source link

Add single file range read feature #1068

Closed davidfarkas closed 6 years ago

davidfarkas commented 6 years ago

Review Checklist

codecov-io commented 6 years ago

Codecov Report

Merging #1068 into master will increase coverage by 0.08%. The diff coverage is 98.61%.

@@            Coverage Diff             @@
##           master    #1068      +/-   ##
==========================================
+ Coverage    90.7%   90.78%   +0.08%     
==========================================
  Files          50       50              
  Lines        6925     6991      +66     
==========================================
+ Hits         6281     6347      +66     
  Misses        644      644
nagem commented 6 years ago

I also added @kofalt as a reviewer. For more info: this is functionality that we'll need sooner than the Object Storage PR can land, which is why I asked David to pull it into it's own PR.

kofalt commented 6 years ago

Given that this does not use a library, and instead hand-implements the range requests, my suggestion would be to toggle this behavior via a URL parameter that is intended only to be used for viewers.

if not 'http header, viewer-range=1':
    # old behavior that is currently deleted
else:
    # new behavior added as part of this PR

That way, the range-request stuff only fires for the viewer code that requires it, and does not fire for ordinary usage such as downloads. This ensures clients (browsers, CLI, python, etc) don't interact with the logic and lets us ship this as-is without having to extensively validate the implementation.

ambrussimon commented 6 years ago

@kofalt The goal was to implement this for object storage, specifically to provide the same (standard) interface as any storage provider does, even when using local storage. Effort went into testing it with both local and google storage, and verifying that it's working the same way (or better). It was separated into it's own PR as per @nagem's request to not have two implementations lurking around on different branches.

I don't think it should go into a URL param, as the goal is to solve it for good in a standard fashion. As you said, effort is needed for validation though.

PS: Not using a lib was a design choice (avoiding dep creep), but for one I'm easily convinced to use 3rd party implementation.

nagem commented 6 years ago

Thanks for making the requested changes. LGTM to merge in it's adjusted state but I'll let @ambrussimon finalize his review as well.

nagem commented 6 years ago

Note: After being confused by the code returning the entire file with a 200 when unable to parse the range request, @ambrussimon noted the Note in this section of the rcf: https://tools.ietf.org/html/rfc7233#section-4.4

Their testing with GCS showed that they also 200 and serve the entire file when encountering an improper range request.