rzane / file_store

🗄️ A unified interface for file storage backends
MIT License
19 stars 3 forks source link

Add a `File.stream!` like interface #33

Open warmwaffles opened 2 months ago

warmwaffles commented 2 months ago

I have been thinking about this some as I have a need to be able to stream a file from an S3 like resource or local disk. I wanted to know your thoughts on this @rzane

Adding

defprotocol FileStore do
  @doc """
  Stream the file from from the store

  ## Options
  - `:bytes` - The number of bytes to chunk the data in.
  - `:line` - If `true` the data will be chunked out by line.

  ## Example
      # Stream in chunks
      MyStorage.stream!("/the/path.txt", bytes: 2048) #=> Enumerable.t()
      # Stream in line by line
      MyStorage.stream!("/the/other.txt", line: true) #=> Enumerable.t()
  """
  @spec stream!(t, key, stream_opts) :: Enumerable.t()
  def stream!(store, key, opts \\ [])
end

For the null adapter this implementation is dead simple

def stream!(_store, _key, _opts), do: Stream.into([], [])

For the Disk adapter it's just as trivial

def stream!(store, key, opts \\ []) do
  path = Disk.join(store, key)

  if opts[:line] do
    File.stream!(path, :line)
  else
    bytes = opts[:bytes] || 2048
    File.stream!(path, bytes)
  end
end

For S3 it's a little trickier but the implementation is essentially spelled out in the documentation here https://hexdocs.pm/ex_aws_s3/ExAws.S3.html#download_file/4

rzane commented 2 months ago

I would love to have this functionality. Would you be willing to take a stab at this?

warmwaffles commented 2 months ago

Cool, just wanted to make sure it is desired. I have some of this done already. I just need to finish working it out and getting tests written for it.

warmwaffles commented 2 months ago

@rzane here's the current implementation I have cooked up.

It's built on top of #32. The problem with Elixir 1.16 is the soft deprecation of File.stream!/3. Once that one is merged, I'll open a PR with the stream interface added and we can work through it some.

I am running into issues at the moment with minio locally and I need to play with my configurations first. Here's the error:

     ** (exit) exited in: GenServer.call(ExAws.Config.AuthCache, {:refresh_auth, %{port: 443, scheme: "https://", host: "s3.amazonaws.com", http_client: ExAws.Request.Hackney, region: "us-east-1", access_key_id: [{:system, "AWS_ACCESS_KEY_ID"}, :instance_role], secret_access_key: [{:system, "AWS_SECRET_ACCESS_KEY"}, :instance_role], json_codec: Jason, retries: [max_attempts: 10, base_backoff_in_ms: 10, max_backoff_in_ms: 10000], normalize_path: true, require_imds_v2: false}}, 30000)
         ** (EXIT) an exception was raised:
             ** (RuntimeError) Instance Meta Error: {:error, %{reason: :connect_timeout}}

     You tried to access the AWS EC2 instance meta, but it could not be reached.
     This happens most often when trying to access it from your local computer,
     which happens when environment variables are not set correctly prompting
     ExAws to fallback to the Instance Meta.

     Please check your key config and make sure they're configured correctly:

     For Example:
 ExAws.Config.new(:s3)
 ExAws.Config.new(:dynamodb)
 ```

             (ex_aws 2.5.3) lib/ex_aws/instance_meta.ex:27: ExAws.InstanceMeta.request/3
             (ex_aws 2.5.3) lib/ex_aws/instance_meta.ex:84: ExAws.InstanceMeta.instance_role_credentials/1
             (ex_aws 2.5.3) lib/ex_aws/instance_meta.ex:92: ExAws.InstanceMeta.security_credentials/1
             (ex_aws 2.5.3) lib/ex_aws/config/auth_cache.ex:132: ExAws.Config.AuthCache.refresh_auth_now/2
             (ex_aws 2.5.3) lib/ex_aws/config/auth_cache.ex:45: ExAws.Config.AuthCache.handle_call/3
             (stdlib 5.2) gen_server.erl:1131: :gen_server.try_handle_call/4
             (stdlib 5.2) gen_server.erl:1160: :gen_server.handle_msg/6
             (stdlib 5.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3