hummingbird-project / hummingbird

Lightweight, flexible HTTP server framework written in Swift
https://hummingbird.codes/
Apache License 2.0
1.23k stars 54 forks source link

Problem implementing FileMiddleware #520

Open Bamakoo opened 3 months ago

Bamakoo commented 3 months ago

πŸ‘‹

My name is Emma and I'm an iOS Developer with a keen interest in Server Side Swift development with some Vapor personal projects under my belt. Blown away by what you've done with Hummingbird. Love it.

I'm currently attempting to use the ETag in order to perform some client-side caching with Swift Data for a personal project that I'm working on. Running into some issues getting it to work.

What I've attempted

So far, I've attempted adding ...

let fileProvider = FileProviderManager() 
router.middlewares.add(FileMiddleware(fileProvider: fileProvider))

... to my buildApplication function to no avail.

This is what a WIP version of FileProviderManager looks like:

//
//  FileProviderManager.swift

import Foundation
import Hummingbird

struct FileProviderManager: FileProvider {
    func getAttributes(id: UUID) async throws -> CustomFileAttributes? {
// Get a file's attributes(size, date, ...) from a UUID 
        print(#function)
        return .init(isFolder: true, size: 3, modificationDate: Date.now)
    }

// function used to get the UUID referencing a particular file from it's path
    func getFileIdentifier(_ path: String) -> UUID? {
// I guess I'd switch on patch here and depending on the path I'd return a specific UUID?
        print(#function)
        return UUID()
    }

    func loadFile(id: UUID, context: some Hummingbird.RequestContext) async throws -> ResponseBody {
// use a UUID to load a specific file 
        print(#function)
        return .init()
    }

    func loadFile(id: UUID, range: ClosedRange<Int>, context: some RequestContext) async throws -> ResponseBody {
        print(#function)
        return .init()
    }
    // These two type aliases have no reason for being I'm going to remove them as soon as I found a solution/workaround to my issue. 
    typealias FileAttributes = CustomFileAttributes

    typealias FileIdentifier = UUID
}

struct CustomFileAttributes: FileMiddlewareFileAttributes {
    var isFolder: Bool

    var size: Int

    var modificationDate: Date
}

What's weird is that I've added breakpoints to these function and they're never called, ran my server, called an endpoint ...

I've tried a workaround which I don't like which would entail me adding it to specific routes - keep in mind that ideally I'd like to have the Etag on all my objects -.

 .add(middleware: FileMiddleware("/Sources"))

This hasn't worked either πŸ™ƒ

I'm at my wits end and would love some help. I reached out to @Joannis who was kind enough to try to set me on the right path. He mentioned some documentation, specifically for ResponseBody I've checked it out and can't see how that'd help me. There's no mention of Etag.

He also mentioned that there were some sample projects available but I've had a look at most of the projects and I haven't found a single one that had a working implementation of a FileManager instance.

I'd really appreciate it if someone could walk me through where I'm going wrong. I've also added comments on what the role of every function was, feel free to correct me if I'm wrong.

Thank you so much for your time and patience.

CC @adam-fowler

adam-fowler commented 3 months ago

It looks like you are returning everything as a folder, so it never finds any files.

    func getAttributes(id: UUID) async throws -> CustomFileAttributes? {
// Get a file's attributes(size, date, ...) from a UUID 
        print(#function)
        return .init(isFolder: true, size: 3, modificationDate: Date.now)
    }
Bamakoo commented 3 months ago

Hi @adam-fowler,

Thanks for the reply. Right now my issue is that this function doesn't even get called. I've added breakpoints to it and called my endpoint. At no point does it transit through this Middleware which is really weird since I've added it globally to the router ...

Do you know of any reason why these function wouldn't get called? Is there any doc on how to implement them properly?

Thanks again for the help.

adam-fowler commented 3 months ago

There is a test showing how to setup a FileProvider. Search for testCustomFileProvider in the Tests folder.

Though from your description of what you want to do I'm not sure why you need a custom FileProvider. The standard one used when you point the FileMiddleware at a folder returns unique ETags for every file.

Joannis commented 3 months ago

@adam-fowler I was in the understanding that the data to be served was stored externally, through a DB.

Bamakoo commented 3 months ago

Yeah I'm storing all my data in PostgreSQL DB. Don't know how much difference that makes?

adam-fowler commented 3 months ago

Ah sorry missed that bit. Yes then you'd need a FileProvider implementation. I would use the implementation from the test I pointed to you as a starting point.

Bamakoo commented 3 months ago

@Joannis asked for a table schema so here it goes.

struct Woman {
    let id: UUID
    let firstName: String
    let lastName: String
    var lifeDescription: String
    let born: Int
    let died: Int?
    let urlString: String
}

extension Woman: ResponseEncodable, Codable, Equatable {}

This is the only thing I'm storing in my DB. A rapid disclaimer that it's called Woman cause I'm storing the personal bios and info of great women throughout history (Ada Lovelace, ...).

Joannis commented 3 months ago

What file are you trying to serve from this?

Bamakoo commented 3 months ago

I have a WomanController.swift file that calls a WomanPostgresRepository in order to connect to the DB.

Joannis commented 3 months ago

I get that part, but what file is being served? I don't see any files mentioned in here. Are you referring to JSON data?

Bamakoo commented 3 months ago

I'm still confused about what you call a file. I'm sorry to be the newb here but for me the file is a .swift file for example. The only thing I'm serving depending on the endpoint is either a single instance of the model I showed you above (Woman) or an array of women.

I'm going to try and implement it the same way the tests do it, see what happens.

Bamakoo commented 3 months ago

Update I've copy/pasted the code from the test:

struct MemoryFileProvider: FileProvider {
    struct FileAttributes: FileMiddlewareFileAttributes {
        var isFolder: Bool { false }
        var modificationDate: Date { .now }
        let size: Int
    }

    var files: [String: ByteBuffer]

    init() {
        self.files = [:]
    }

    func getFileIdentifier(_ path: String) -> String? {
        return path
    }

    func getAttributes(id path: String) async throws -> FileAttributes? {
        guard let file = files[path] else { return nil }
        return .init(size: file.readableBytes)
    }

    func loadFile(id path: String, context: some RequestContext) async throws -> ResponseBody {
        guard let file = files[path] else { throw HTTPError(.notFound) }
        return .init(byteBuffer: file)
    }

    func loadFile(id path: String, range: ClosedRange<Int>, context: some RequestContext) async throws -> ResponseBody {
        guard let file = files[path] else { throw HTTPError(.notFound) }
        guard let slice = file.getSlice(
            at: range.lowerBound,
            length: range.count
        ) else {
            throw HTTPError(.rangeNotSatisfiable)
        }
        return .init(byteBuffer: slice)
    }
}

I also still call

    let fileProvider = MemoryFileProvider()
    router.middlewares.add(FileMiddleware(fileProvider: fileProvider))

I've added breakpoints to the functions and they're never called which is really weird ... I'd expect them to be called. I put a breakpoint right after the router and the middleware is added to it.

Also I think this isn't going to work since the files are stored in memory which would mean that every time I restart the server it'll break the whole implementation.

Bamakoo commented 3 months ago

Talking with @Joannis in parallel I realise there's a bit of a misunderstanding.

Current state

Currently, when I run my server locally on my Mac and perform

curl -v 127.0.0.1:8080/my-path

I get the following result:

*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080
> GET /my-path HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/8.6.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Content-Length: 715
< Date: Mon, 12 Aug 2024 15:38:18 GMT
< 
* Connection #0 to host 127.0.0.1 left intact
Array<MyObject>()

Expected outcome

What I'd want to achieve is to have a Etag header which would look something like this:

*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080
> GET /my-path HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/8.6.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Content-Length: 715
< Etag: "hashValue"
< Date: Mon, 12 Aug 2024 15:38:18 GMT
< 
* Connection #0 to host 127.0.0.1 left intact
Array<MyObject>()

Why I need this

I'd need the Etag in order to be able to add to the "If None Match" header when performing network calls from the client. The server would then return a 200 response with the new body if there have been changes since the last call or a 304 response if there haven't been any changes.

adam-fowler commented 3 months ago

Let me get this right you want to convert a request path into a database request. The response from this request should return the contents of a database row (in JSON?), and an eTag header.

If your request contains an "if none match" header it should compare this with what will be the eTag of the response and if they are the same return a NotModified response.

Bamakoo commented 3 months ago

That's 50% right @adam-fowler. It returns an array of DB rows/entries basically an array of Women objects.

The comparison of Etags is already implemented properly in FileManagerMiddleware from HB from what I've seen so it don't need to do anything there ...

With the caveat that some of my routes like /my-path/{id} actually return a single DB entry versus an array of DB entries. In that case, I'd want to get an Etag for that particular entry meaning has my woman "changed" in any way since last query (ie wether a PATCH or PUT requests has been performed on that particular entry).

Am I making any sense? I'm a very Junior dev' so I may wildly off mark here.

Is this how Etag is supposed to be used?

Thanks again for the support. Much appreciated.

adam-fowler commented 3 months ago

There are two different reasons the FileMiddleware might not run. 1) The router has already matched an endpoint. If the response returned from the router is a 404 ie NotFound, the middleware runs, otherwise it skips the FileMiddleware. 2) The FileMiddleware has been added to a RouterGroup. Middleware in a group only runs if an endpoint in that group is matched.

Am I making any sense? I'm a very Junior dev' so I may wildly off mark here.

Yes you are

Bamakoo commented 3 months ago

OK so let me see if I can get this straight.

1/ The FileMiddleware ONLY runs if a an endpoint returns a 404? πŸ˜• Does that mean my endpoints have to return 404s for the FileMiddleware to run?

I think there are several things I'm not understanding ...

2/ As described previously, all I ever do it add the Middleware to the Router in my buildApplication function. I don't add it to particular routes or endpoints since ideally I'd need this to work "globally" on all routes/endpoints ...

I guess my question is what is the right way to get the FileMiddleware to run all the time ie for every incoming request?

adam-fowler commented 3 months ago

Yes the FileMiddleware is meant to only run if no other endpoint is matched. There is no point having the FileMiddleware return a new response if an endpoint has already generated one. The main use of the FileMiddleware is to return files that are stored on the server. The FileProvider generic parameter was added to make it a little more flexible so it could return files stored elsewhere say S3 or in a database. FileMiddleware is not used to augment already built responses.

If you are only looking to provide Etag generation and comparison for your responses I would either do that in the actual endpoint or in a separate middleware.

Bamakoo commented 3 months ago

Am I to understand that I'd have to generate the Etag myself ?

And handle the comparison myself and add the header to an EditedResponse?

adam-fowler commented 3 months ago

Am I to understand that I'd have to generate the Etag myself ?

And handle the comparison myself and add the header to an EditedResponse?

Yes. The FileMiddleware is designed to serve files. The fact that it generates ETag's is a very small part of it and linked to the fact that it is serving files.

Bamakoo commented 3 months ago

OK so I think this is on me. I wrongly understood that I could generate Etags. Legally speaking how allowed am I to use your implementation to add my Etags to my headers?

I wouldn't want to steal your IP.

adam-fowler commented 3 months ago

OK so I think this is on me. I wrongly understood that I could generate Etags. Legally speaking how allowed am I to use your implementation to add my Etags to my headers?

I wouldn't want to steal your IP.

I'm happy for you to copy it with attribution

Bamakoo commented 3 months ago

πŸ†— thanks so much. I think we've solved the issue in the sense that I thought that FileManager would work for me but it's not built for what I'm trying to do version some JSON essentially.

Thank you so very much for your time and effort in trying to help me understand what's what. Goes a long way and means a lot.

I hope you have a great day and will 1000% share any progress I make on implementing it myself.

Should I let you close the ticket or would you like me to do it?

Bamakoo commented 3 months ago

Also and keeping in mind I'm a very Junior Dev' with little actual experience I was wondering wether you'd be interested if I opened a pull request on this repository to add my implementation to the official HB repository in case other devs want to add Etags to their headers.

I recognise that it's going to be super hard but I'd like to try.

adam-fowler commented 3 months ago

How you generate an ETag is dependent on situation and source of your data. Some people think it should be the md5 of the file contents, some people think it should a combination of file modification data, size and inode. In the FileMiddleware I went for file modification data and size.

If you can come up with some generic solution I'm happy to have a look

Bamakoo commented 3 months ago

@adam-fowler @Joannis

I've begun implementing my EtagHelper component.

Reusing the FileAttributes type. What do you think about using the .count property of an array as a "substitute" for file size?

I was thinking of using the date the last element of the array was modified as a date.

Does this seem smart to you?

adam-fowler commented 3 months ago

@Bamakoo I think you should do what works for your project first. Then we can discuss whether it'll be suitable for Hummingbird.