google / magika

Detect file content types with deep learning
https://google.github.io/magika/
Apache License 2.0
7.76k stars 412 forks source link

Bindings for Golang #96

Open reyammer opened 7 months ago

reyammer commented 7 months ago

Make magika available to Go applications.

LeslieLeung commented 7 months ago

I think I can help with this.

reyammer commented 7 months ago

Awesome! should we then assign this to you so that people know someone is already on it? :-)

reyammer commented 7 months ago

If you actually want to pick this up, great, and let's discuss briefly how you would approach this. There are several ways to deal with this, let's make sure we align first on the approach :-)

LeslieLeung commented 7 months ago

Here are some of my ideas:

weebney commented 7 months ago

Would love to participate in this.

reyammer commented 7 months ago

Thanks for the interest!

Here are my thoughts:

Feedback is welcome :-)

LeslieLeung commented 7 months ago

I am not very keen on the idea of using shared library on creating bindings for go (just go, not other languages). Using shared library means using CGO(I am not very sure about this, correct me if I am wrong), and CGO has at least these drawbacks compared to native go:

LeslieLeung commented 7 months ago

Note for external contributors: this may become the future de-facto magika client, so its design will need extra care. We are reaching out internally as well, so please let us know if you are starting working on this so that we can coordinate better and/or avoid duplicated work. Thank you!

I wasn't aware of this earlier. A rust implementation would be enough, apologies for any inconvenience caused.

reyammer commented 7 months ago

Hey, no worries! and sorry for delay in following up. Interesting you mentioned the rust impl would be enough... how would a golang app call out magika inference without shelling out to the CLI?

LeslieLeung commented 7 months ago

Hey, no worries! and sorry for delay in following up. Interesting you mentioned the rust impl would be enough... how would a golang app call out magika inference without shelling out to the CLI?

I mean a go CLI would not be necessary if you already have plans for a Rust one.

But for the go package, there are a few different approaches.

  1. implement both feature extraction and inference with native go
  2. call the Rust CLI in go, executing CLI commands(is this what you meant by "shelling out to the CLI"?)

Personally I prefer the first one because a) the feature extraction logic is not too complicated to implement, b) dependency management would be easier, and c) better performance.

The second one is easier to implement and more agnostic to changes though. But the performance overhead is not ideal since the package is more commonly used in online processing instead of offline CLI/script.

reyammer commented 4 months ago

Hello @LeslieLeung, sorry for the long delay in replying! and thanks for your input! We are preparing a new release, and we are now actively thinking about this go bindings aspect as well; your input is very useful! We will update this issue when we wrap up our thoughts on the topic. Thanks again!