luabagg / orcgen

Web pages and HTML content conversion
https://pkg.go.dev/github.com/luabagg/orcgen
MIT License
9 stars 4 forks source link

Orcgen New Version - simpler and more configurable package #5

Closed luabagg closed 2 months ago

luabagg commented 2 months ago

Makes the implementation of the package much simpler, and enables configurations using rod.

Changes:

New flow: The user must first initialize a rod's Page instance. It can be done using the webdriver struct. The user can also configure the timeouts or other Browser options. Then the user must instantiate a handler, and can optionally change the default configuration. After this, the user can use the handler directly, calling GenerateFile. It returns a Fileinfo struct, that can be used to access the content and / or save a file.

Todos:

magandrez commented 2 months ago

Hey @luabagg thank you for the prompt reaction. I played around with this changes in my machine. This is what I got:

First of all, the package continues to feel about the same size, extendable and more configurable. In that sense, I think the change is positive. So for those reasons, I think such a change is actually valuable 👍🏻

On the minus side, as an engineer that looks after the libraries I use, I think this change adds functionality but at the expense of internal complexity, luckily, this is not complexity that users of the library have to deal with. For this reason, the below is just an idea for the re-development, but no need to take it in if you don't feel like it or not see it valuable, I will totally understand.

In my opinion, adopting generics here might be a bit of an overkill (at least in the way its been adopted). The most clear reason can be seen in the complexity of NewHandler or on the signature of Generate.

NewHandler as it is in this change, is relying on type assertions to figure out the config type. This ties behavior to conditional types, scattering or encapsulating behavior without being clear to what they do (at least not according to its signature. Instead, I would suggest something like the following:

  1. Use a tagged interface for configurations:
package handlers

type Config interface {
    GetHandler() FileHandler[Config]
}
  1. Implement make all the handlers implement the Config interface:

The signature of such would look something like:

func (p *PDFHandler) GetHandler() handlers.FileHandler[handlers.Config]
  1. Simplify NewHandler like:

func NewHandler(config Config) FileHandler[Config] {
    handler := config.GetHandler()
    return handler.SetConfig(config)
}

This way, the refactored NewHandler could be used like:

config := proto.PagePrintToPDF{}
handler := handlers.NewHandler(config)
// handler is now correctly initialized as a PDF handler with the appropriate configuration.

And thus the interface with the library would remaining pretty clear and easy to use, and at the same time the complexity of the internals would be lowered.

Similarly, the interfaces below seem to be mixed:

type Config interface {
    proto.PageCaptureScreenshot | proto.PagePrintToPDF
}

// FileHandler interface contains the methods used for page conversion.
type FileHandler[T Config] interface {
    // SetConfig adds the config to the instance.
    SetConfig(config T) FileHandler[T]
    // SetFullPage sets the pages to be converted. If false, only the first page is selected.
    // Default is false.
    SetFullPage(fullPage bool) FileHandler[T]
    // GenerateFile converts a rod Page instance to a file.
    GenerateFile(page *rod.Page) (*fileinfo.Fileinfo, error)
}

Since the interfaces are meant to describe behavior, the signature of the methods in this two are mixed; FileHandler is both describing the method for a handler to generate a file as well as describing how it is configured. It is well known that composition is preferred to inheritance to achieve the desired (polymorphic) behavior.

Similarly, in order to reduce complexity from the Generate function, I would rely on further leveraging interfaces:

  1. Define type-specific interfaces:
// Define an interface for HTML conversion.
type HTMLConvertible interface {
    Convert(handler handlers.FileHandler[handlers.Config]) (*fileinfo.Fileinfo, error)
}

// Define an interface for Byte conversion.
type ByteConvertible interface {
    Convert(handler handlers.FileHandler[handlers.Config]) (*fileinfo.Fileinfo, error)
}

// Here corresponding implementations of `Convert`
// ...
  1. Refactor Generate to make use of the implementations of the above interfaces
func Generate[T HTMLConvertible | ByteConvertible](html T, config handlers.Config, output string) error {
    handler := handlers.NewHandler(config)

    // Directly call the Convert method on the interface without type assertions.
    fileinfo, err := html.Convert(handler)
    if err != nil {
        return err
    }
    return fileinfo.Output(output)
}

I hope this makes sense. In any case, it is just an idea 😃

magandrez commented 2 months ago

Some sources related to the above:

luabagg commented 2 months ago

@magandrez I think you're totally right. I didn't like the generics implementation either, but I had to finish the lib, so I did that way haha.

I will take a look at this soon. Thanks for the review.

luabagg commented 2 months ago

@magandrez I couldn't make the tagged interface work, I think I'm missing something. I will merge this PR and I'd be glad if you could submit a working PR later.

magandrez commented 2 months ago

No problem. At the moment I am busy, but I will come back to this later this fall, this already helps the most pressing need for configuration.

Thanks for the help and the prompt response!