miyucy / brotli

MIT License
59 stars 14 forks source link

Writer implementation #32

Closed romanbsd closed 3 years ago

romanbsd commented 4 years ago

Is it something that you might consider adding? I didn't write tests or Reader implementation yet, though.

miyucy commented 4 years ago

Thank you for PR! I have no objection to add it.

but, I'm not familiar about BrotliEncoderCreateInstance. that is streaming compress interface?

Should Zopfli::Writer will have Zlib::GzipWriter like methods?

romanbsd commented 4 years ago

Yes, precisely. It's suited for streaming. It's working, however I'm facing a minor issue- the brotli encoder adds 1 byte after the finish operation is invoked. This makes it awkward to use in a Rack handler, since it will emit a 1 byte with a chunked encoding header as a last part, which is a waste. In the nginx module this problem doesn't exist, since the handler there knows when the input is exhausted, thus it's able to issue the finish command with the last block. However, in the Rack domain the response body is iterated with Enumerable#each and I don't think it's possible to know whether this is the last iteration. Perhaps you have some idea how to overcome this?

miyucy commented 4 years ago

the last iteration

I guess it's impossible...

if we call flush method each time, brotli encoder will stop write last chunk? (e.g. https://github.com/rack/rack/blob/master/lib/rack/deflater.rb#L102)

miyucy commented 4 years ago

Ah, brotli encoder's flush is not same meaning to Zlib::GzipWriter#flush. https://github.com/google/brotli/blob//go/cbrotli/writer.go#L125 so, it can call only one time...

romanbsd commented 4 years ago

Here's a working implementation of a streaming brotli encoder Rack middleware

# frozen_string_literal: true

require 'brotli'
require 'rack/utils'

module Rack
  # This middleware enables content encoding of http responses,
  # usually for purposes of compression.
  #
  # This middleware automatically detects when encoding is supported
  # and allowed. For example no encoding is made when a cache
  # directive of 'no-transform' is present, when the response status
  # code is one that doesn't allow an entity body, or when the body
  # is empty.
  #
  class Brotli
    # Creates Rack::Brotli middleware. Options:
    #
    # :if :: a lambda enabling / disabling deflation based on returned boolean value
    #        (e.g <tt>use Rack::Brotli, :if => lambda { |*, body| sum=0; body.each { |i| sum += i.length }; sum > 512 }</tt>).
    #        However, be aware that calling `body.each` inside the block will break cases where `body.each` is not idempotent,
    #        such as when it is an +IO+ instance.
    # :include :: a list of content types that should be compressed. By default, all content types are compressed.
    # :sync :: determines if the stream is going to be flushed after every chunk.  Flushing after every chunk reduces
    #          latency for time-sensitive streaming applications, but hurts compression and throughput.
    #          Defaults to +true+.
    def initialize(app, options = {})
      @app = app
      @condition = options[:if]
      @compressible_types = options[:include]
      @sync = options.fetch(:sync, true)
    end

    def call(env)
      status, headers, body = @app.call(env)
      headers = Utils::HeaderHash[headers]

      unless should_modify?(env, status, headers, body)
        return [status, headers, body]
      end

      request = Request.new(env)

      encoding = Utils.select_best_encoding(%w[br identity],
                                            request.accept_encoding)

      # Set the Vary HTTP header.
      vary = headers['Vary'].to_s.split(',').map(&:strip)
      unless vary.include?('*') || vary.include?('Accept-Encoding')
        headers['Vary'] = vary.push('Accept-Encoding').join(',')
      end

      case encoding
      when 'br'
        headers['Content-Encoding'] = 'br'
        headers.delete(CONTENT_LENGTH)
        [status, headers, BrotliStream.new(body, @sync)]
      when 'identity'
        [status, headers, body]
      when nil
        message = "An acceptable encoding for the requested resource #{request.fullpath} could not be found."
        bp = Rack::BodyProxy.new([message]) do
          body.close if body.respond_to?(:close)
        end
        [406, { CONTENT_TYPE => 'text/plain', CONTENT_LENGTH => message.length.to_s }, bp]
      end
    end

    # Body class used for brotli encoded responses.
    class BrotliStream
      # Initialize the brotli stream.  Arguments:
      # body :: Response body to compress with brotli
      # sync :: Whether to flush each brotli chunk as soon as it is ready.
      def initialize(body, sync)
        @body = body
        @sync = sync
      end

      # Yield brotli compressed strings to the given block.
      def each(&block)
        @writer = block
        brotli = ::Brotli::Writer.new(self, quality: 5)
        @body.each do |part|
          # Skip empty strings, as they would result in no output.
          next if part.empty?

          brotli.write(part)
          brotli.flush if @sync
        end
      ensure
        brotli.close
      end

      # Call the block passed to #each with the the compressed data.
      def write(data)
        @writer.call(data)
      end

      # Close the original body if possible.
      def close
        @body.close if @body.respond_to?(:close)
      end
    end

    private

    # Whether the body should be compressed.
    def should_modify?(env, status, headers, body)
      # Skip compressing empty entity body responses and responses with
      # no-transform set.
      if Utils::STATUS_WITH_NO_ENTITY_BODY.key?(status.to_i) ||
         /\bno-transform\b/.match?(headers['Cache-Control'].to_s) ||
         headers['Content-Encoding']&.!~(/\bidentity\b/)
        return false
      end

      # Skip if @compressible_types are given and does not include request's content type
      if @compressible_types && !(headers.key?('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))
        return false
      end

      # Skip if @condition lambda is given and evaluates to false
      return false if @condition && !@condition.call(env, status, headers, body)

      # No point in compressing empty body, also handles usage with
      # Rack::Sendfile.
      return false if headers[CONTENT_LENGTH] == '0'

      true
    end
  end
end
romanbsd commented 4 years ago

The meaning of flush is very similar. If the flush on brotli is not called, though, most likely that the streaming won't happen, as the brotli encoder will buffer the data in order to achieve better compression. And I think that the same goes for Gzip. In both cases the default is sync=true.

miyucy commented 4 years ago

@romanbsd So sorry for late reply.

most likely that the streaming won't happen

Thank you for explanation to flush mechanism about brotli encoder, and writing working rack middleware. I want to merge and publish this feature and rack middleware if I can. Could you add rack middleware to this PR?

andrew-aladev commented 3 years ago

Hello, I was nursing gvl wrappers around all functionality for a long period of time, finally released ruby-brs v1.2.0 and ruby-zstds v1.1.0.

You may be interesting in these implementations. Please review code: it includes gvl wrappers and complete suite of tests for all functionality. You may copy any amount of code into miyucy/brotli project, it may help you with finishing this pull request. Thank you.

miyucy commented 3 years ago

@romanbsd I'm so sorry for late reply.. and thank you for implementation. I'll release next version with it.