google / go-tika

Go package for using Apache Tika
Apache License 2.0
229 stars 40 forks source link

Expose Tika http status code in errors returned by client methods #24

Closed tomyl closed 4 years ago

tomyl commented 4 years ago

For users of tika.Client it can be useful to be able to differentiate between intermittent errors (http status code 500) and content related errors (e.g. 415 and 422) however currently the client methods just return an opaque error string.

I'm experimenting in my fork https://github.com/tomyl/go-tika with exposing the http status code in the error. Basically:

diff --git a/tika/tika.go b/tika/tika.go
index a6ffdab..8a0cd39 100644
--- a/tika/tika.go
+++ b/tika/tika.go
@@ -29,6 +29,16 @@ import (
        "golang.org/x/net/context/ctxhttp"
 )

+// ClientError represents an error response from the Tika server.
+type ClientError struct {
+       // StatusCode is the http status code returned by the Tika server.
+       StatusCode int
+}
+
+func (e ClientError) Error() string {
+       return fmt.Sprintf("response code %d", e.StatusCode)
+}
+
 // Client represents a connection to a Tika Server.
 type Client struct {
        // url is the URL of the Tika Server, including the port (if necessary), but
@@ -107,7 +117,7 @@ func (c *Client) call(ctx context.Context, input io.Reader, method, path string,
        }
        defer resp.Body.Close()
        if resp.StatusCode != http.StatusOK {
-               return nil, fmt.Errorf("response code %v", resp.StatusCode)
+               return nil, ClientError{resp.StatusCode}
        }
        return ioutil.ReadAll(resp.Body)
 }

The calling code can do something like

func doStuff(input io.Reader, tikaURL string) error {
    client := tika.NewClient(nil, tikaURL)
    s, err := client.Parse(context.Background(), input)
    if isUnsupportedFileFormat(err) {
        return nil
    }
    if err != nil {
        return err
    }
   ...
}

func isUnsupportedFileFormat(err error) bool {
    var tikaErr tika.ClientError

    if errors.As(err, &tikaErr) {
        switch tikaErr.StatusCode {
        // Password protected documents yield StatusUnprocessableEntity
        case http.StatusUnsupportedMediaType, http.StatusUnprocessableEntity:
            return true
        default:
            return false
        }
    }

    return false
}

Thoughts? I'm happy to submit a PR if a change like this would be accepted.

tbpg commented 4 years ago

This looks good to me! Thanks for filing an issue and offering to send a PR.

tomyl commented 4 years ago

Cool, I submitted PR #25.

tbpg commented 4 years ago

Closing this as the PR has been merged. Thanks!