vifactor / repostat

Inspired by gitstats project: git repository desktop analyzer
GNU General Public License v3.0
106 stars 13 forks source link

Provide file statistics per programming language #171

Open pulkomandy opened 4 years ago

pulkomandy commented 4 years ago

I think the max length for extensions was to avoid silly results if you have files without extensions and with dots in the filename? I would do this by collecting all extensions (anything after a '.', and maybe consider the full name for files with no '.' at all, but that should probably be optional). And then if one of these extensions is used only once or only a few times, put it in an "other" group instead.

The idea about Makefiles also raises the question, are extension what really matter here? I think the interesting data would be stats per programming language or something like that. Which means grouping .c and .h for C, but maybe differenciating C from C++ in .h files. That can get heavier to compute, however, if we need to look in the file contents.

There is ohcount which can even handle mixed languages in a single file (for example php/html/javascript/css): https://github.com/blackducksoftware/ohcount

vifactor commented 4 years ago

For the first part of the issue I'll take a look what simplest I could do to have it soonish. Perhaps, fix of functionality with max_ext_length.

interesting data would be stats per programming language or something like that

This indeed is interesting and I know that pydriller uses lizard for something like that which is in contrast to ohcount is python package. Would also be useful for counting LOCs instead of total lines. But so far do not see when it can be implemented.

vifactor commented 4 years ago

@pulkomandy , there is a PR which fixes first part, I suppose. Please, take a look if you have time.

pulkomandy commented 4 years ago

Yes, that solves my main problem I think, thanks :)