yoeo / guesslang

Detect the programming language of a source code
https://guesslang.readthedocs.io
MIT License
810 stars 117 forks source link

Guesslang without Tensorflow #26

Open geert56 opened 4 years ago

geert56 commented 4 years ago

Is it possible to run guesslang inference without the need of TensorFlow? I see that some pre-trained model is offered as part of the github. But users will still need to install the complete TensorFlow to run guesslang. Just a thought.

yoeo commented 4 years ago

Hello @geert56,

That's an interesting question.

Indeed there is a pre-trained model shipped with Guesslang https://github.com/yoeo/guesslang/tree/master/guesslang/data/model but as far as I know, we still need to use Tensorflow to load and interact with the model. (Info about saved models here https://www.tensorflow.org/guide/saved_model )

Tensorflow offers tools that can be used to convert the saved model into an other Tensorflow flavor's format like Tensorflow-Lite or Tensorflow-JS. That can help you run the model on a mobile device or a website, with the appropriate Tensorflow runtime.

An other option is to use third party tools to convert the Tensorflow saved model to the Open Neural Network Exchange (ONNX) format. This open format is supported by various machine learning platforms including PyTorch & Caffe2 https://onnx.ai/supported-tools.html

geert56 commented 4 years ago

Dear Y. Somda, thanks for the extensive answer. I am very familiar with TF and have build many models for it and also converted them to TF-Lite. It is just that for a rather computationally undemanding application as language classification it would be nice to have a compact standalone tool that would run on minimal hardware. I fully understand your choice. The problem with most frameworks (TF, Caffe, PyTorch, Chainer) is that it is almost impossible to exchange models of pre-trained data. Anyway, nice job.

albertopoljak commented 3 years ago

Hello, I tried to convert the model to tensorflow-lite with recommended approach but I get ValueError: This converter can only convert a single ConcreteFunction. Converting multiple functions is under development.

I have absolutely no experience with this, I just wanted to use lite version instead of the full blown package (I don't need model training).

yoeo commented 3 years ago

Hi @albertopoljak, I tried the TensorFlow Lite and TensorFlow.js converters too... with some hacking (trying everything I found on Stackoverflow), I was able to generate the models but they were barely usable.

I hope that Google improved the converters since then...

AndydeCleyre commented 3 years ago

I just want to add a use case here: I'd like to deploy in an Alpine Linux environment/container, and Tensorflow does not play nicely with Alpine/musl.

yoeo commented 3 years ago

Hello @AndydeCleyre,

As you said, I just tried to install Tensorflow on Alpine and I'm getting an error

/ # pip install tensorflow ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflow

If that's possible, I'd recommend that you use a dedicated container for Guesslang, based on tensorflow/tensorflow. See https://www.tensorflow.org/install/docker

I tried it and it worked fine.

AndydeCleyre commented 3 years ago

I'd recommend that you use a dedicated container for Guesslang

Are you saying that my Python program using Guesslang can be in an Alpine container, while using TensorFlow from another container?

Or what? I don't see how I could split Guesslang from my app's container . . .

yoeo commented 3 years ago

Or what? I don't see how I could split Guesslang from my app's container . . . Yes, I was thinking about that. You would call Guesslang through a REST API or something like that.

I have an example code here that runs Guesslang in a Flask web application https://github.com/yoeo/chameledit/blob/a87a9a4914952fe76bd2c7457ff4632c42b6142c/chameledit/chameledit.py#L17

This solution can be overkill depending on your needs, but the only other solutions that I can think about are:

AndydeCleyre commented 3 years ago

Thanks! I may play around with that.

yjmm10 commented 3 years ago

Dear Y. Somda, thanks for the extensive answer. I am very familiar with TF and have build many models for it and also converted them to TF-Lite. It is just that for a rather computationally undemanding application as language classification it would be nice to have a compact standalone tool that would run on minimal hardware. I fully understand your choice. The problem with most frameworks (TF, Caffe, PyTorch, Chainer) is that it is almost impossible to exchange models of pre-trained data. Anyway, nice job.

Hello @geert56, Recently, I also wanted to convert the tensorflow model of the guesslang into an onnx model or a tensorflow lite model, but they all failed. Have you successfully converted? Can you share your experience below?

dandavison commented 3 years ago

Hi @yoeo and everyone else, I'm also interested in doing this. I tried, and ran into tf/tf-lite errors. Just in case it helps / inspires anyone to have another go, here is a script with the steps that I've tried so far, and the tracebacks I obtained: https://github.com/dandavison/misc-python/blob/main/tensorflow_lite_create_model.py

It would be great if we could figure this out! (I was going to look into calling it from Rust using https://github.com/boncheolgu/tflite-rs)

dandavison commented 3 years ago

@geert56, it sounds like you have a lot of experience with TF. Would you be able to help point the way forwards regarding the errors we are encountering with TF-Lite conversion? (See e.g. https://github.com/dandavison/misc-python/blob/main/tensorflow_lite_create_model.py)