markusressel / KodeHighlighter

Simple, extendable code highlighting for Spannables on Android.
MIT License
18 stars 3 forks source link

Automatic language detection #14

Open markusressel opened 5 years ago

markusressel commented 5 years ago

Is your feature request related to a problem? Please describe. Currently the dev has to know what syntax highlighter to use for a given text.

Describe the solution you'd like The KodeEditor (or a layer in between) should be able to detect what language is most likely used and apply syntax highlighting automatically. This behaviour should be optional so that the dev can still force a specific language if desired.

markusressel commented 5 years ago

Using something like this would be an option, although the trained models are pretty big (approx. 150 MB): https://github.com/aliostad/deep-learning-lang-detection

Integrating this seems to be relatively easy: https://medium.com/capital-one-tech/using-a-pre-trained-tensorflow-model-on-android-part-2-153ebdd4c465

GitHub
aliostad/deep-learning-lang-detection
Deep Learning using Keras to detect programming language of a file or snippet - aliostad/deep-learning-lang-detection
Medium
Using a Pre-Trained TensorFlow Model on Android — Part 2
In Part 1, I introduced you to the TensorFlowInferenceInterface and the org.tensorflow:tensorflow-android dependency. Together they provide an easy way to embed pre-trained TensorFlow models in your…
markusressel commented 4 years ago

A more naive approach could be to simply count the number of role matches for all available rule books and use the one with the highest count.

markusressel commented 4 years ago

It would also be nice to inlude common file extensions in the rule book, to detect the language simply based on the file name.

Both detection variants should be usable independently.

markusressel commented 3 years ago

Also interesting: https://github.com/dlaststark/machine-learning-projects/tree/master/Programming%20Language%20Detection

https://medium.com/swlh/detecting-programming-languages-from-code-snippets-d758589bddb0