src-d / enry

A faster file programming language detector
https://blog.sourced.tech/post/enry/
Apache License 2.0
460 stars 51 forks source link

enry treats Protobuf as PureBasic #100

Closed vmarkovtsev closed 7 years ago

vmarkovtsev commented 7 years ago

Any *.pb file in protobuf format is detected as PureBasic in the most recent release 1.4.

abeaumont commented 7 years ago

So does linguist, if I'm not mistaken.

vmarkovtsev commented 7 years ago

Nopes.

linguist .
100.00% Python

The pb file is under version control.

abeaumont commented 7 years ago

Weird, considering that, for example, libuast is considered 86.9% PureBasic... and I can reproduce it with cli linguist:

$ linguist .
86.87%  PureBasic
6.68%   Python
5.47%   Java
0.48%   C
0.35%   C++
0.12%   CMake
0.02%   Go
0.01%   Shell

What repo and pb files are you testing?

vmarkovtsev commented 7 years ago

I can reproduce it on libuast repo too. I used the local version of src-d/ast2vec with a committed pb file.

vmarkovtsev commented 7 years ago

https://github.com/github/linguist/issues/3816

vmarkovtsev commented 7 years ago

@abeaumont As noted in the linked issue, there are override files. Does enry support them?

abeaumont commented 7 years ago

@vmarkovtsev No, it doesn't. There's an issue and a PR for that feature, but it requires some non-trivial changes and it's not expected to be done anytime soon.

abeaumont commented 7 years ago

It seems to be working properly now:

> enry
79.31%  Python
10.34%  Protocol Buffer
3.45%   C
3.45%   Text
3.45%   Makefile
vmarkovtsev commented 7 years ago

Cool! But what has changed?

abeaumont commented 7 years ago

I guess linguist got fixed: https://github.com/src-d/enry/pull/118/files#diff-2dd48c23213b4dde1b5c99ba45fc9086L321