roskakori / pygount

count lines of code for hundreds of languages using pygments
https://pygount.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
159 stars 23 forks source link

Take possible .gitignore into account for skipping files #20

Open roskakori opened 5 years ago

roskakori commented 5 years ago

Goals

Implementation notes

The approach currently used by isort.settings is to first collecting the paths of all possible files in the project and then passing them through git check-ignore -z --stdin.

This would be relatively easy to implement and also simplify the implementation of #59 and #71. At least when using a naive way where the whole list of files is kept in memory, even including the ones that are later to be ignored.

With projects that have a large amount of ignored files (e.g. game development after baking assets into a build folder or complex projects with compiled languages and files like *.o or *.class) the scanning step would be much slower than before.

Probably its still worth it as the recommendation still stands to run pygount before the build, where fewer ignored files are lying around. And at a later point, someone else can step in and make a more ambitious implementation based on a recursive generator function that can evaluate each file and folder to be scanned one-by-one using something like gitignore-parser.

adam-moss commented 4 years ago

This would be an useful feature, and would avoid the need to specify --folders-to-skip and --files-to-skip which can be a challenge to get right if processing lots of repos.

roskakori commented 4 years ago

@adam-moss In practice I've used pygount on a clean checkout when nothing has been built yet. Also I have not yet found a compelling gitignore Python library with a function that simply returns all non ignored files from a folder and it subfolders.

So currently there are no specific plans for this feature.

adam-moss commented 4 years ago

Agree that is a simpler approach more generally đź‘Ť

roskakori commented 2 years ago

Status update: I looked into how isort gitignores files and added some implementation notes to the original description. It's not pretty but might just work good enough.

damif94 commented 2 years ago

Sorry I just find this thread; I was also thinking this could be a useful feature.

So it would just be to replicate the gitignore-based path exclusion logic from isort inside the SourceScanner and use it in folders_to_skip?

Maybe I can handle that

roskakori commented 2 years ago

So it would just be to replicate the gitignore-based path exclusion logic from isort inside the SourceScanner and use it in folders_to_skip?

I'm currently leaning towards reworking the whole SourceScanner without concerns about backwards compatibility for the semi working and mostly confusing --folders_to_skip. Would probably need to bump the semantic version to 2.0.0.