wakatime / wakatime-cli

Command line interface used by all WakaTime text editor plugins
https://wakatime.com/plugins
BSD 3-Clause "New" or "Revised" License
257 stars 40 forks source link

Language detection for files with no extension #821

Closed IgnisDa closed 10 months ago

IgnisDa commented 1 year ago

As of the current version of wakatime cli, files which do not have an extension are tagged as "Unknown language". But often these files have shebangs inside them (eg: #!/usr/bin/env bash, #!/usr/bin/python etc) which can be used to detect the language of the file.

Would you consider a feature wherein the cli additionally checks for the shebang for language and then tags it as "Unknown language" if nothing is detected.

alanhamlett commented 1 year ago

Yes, we used to do that with the legacy Python wakatime-cli via the Pygments library:

https://github.com/wakatime/legacy-python-cli/blob/e8deb156f1c2d26e5cf874da97f7b4354b3f5d20/wakatime/packages/py27/pygments/util.py#L125

We can add that to wakatime-cli too.

IgnisDa commented 1 year ago

@alanhamlett Can I make a PR for this?

gandarez commented 1 year ago

@alanhamlett Can I make a PR for this?

Sure go ahead. If it's your first time contributing please read our guidelines.

muety commented 1 year ago

In addition, it might be helpful to have a way of doing such mappings "manually", either in ~/.wakatime.cfg or .wakatime-project, for files that don't even have a shebang. For example, all files matching a configured regex could get mapped to a specific language. Could also be used to "override" the standard language detection (for whatever reason you might want to do that...).

alanhamlett commented 1 year ago

In addition, it might be helpful to have a way of doing such mappings "manually", either in ~/.wakatime.cfg or .wakatime-project, for files that don't even have a shebang. For example, all files matching a configured regex could get mapped to a specific language. Could also be used to "override" the standard language detection (for whatever reason you might want to do that...).

We already have a [projectmap] section for regex patterns, we could add a [language_map] section. The regex should run against the full file path, but can always only match the file name or extension of course.

gandarez commented 1 year ago

Maybe relying on Chroma would do the dirty job for us. Chroma has a function that analyzes the content of a file and can detect even if there's no filename. I already started working in a possible solution.