Optimze format time - Githubissues

bdw429s commented 3 years ago

One one of my local projects, we've only got 192 CF files that get updated when we run cfformat run --overwrite but it takes nearly 2 and a half minutes. That's about 750 ms per file.

I looked at the FusionReactor profile for the command but nothing really jumps out. Most all the time is spent in /models/Delimited.cfc and models/CFScript.cfc but all just doing low level stuff like underlying hashmap access and calling UDFs. I don't know how much the code can be improved, but one idea I had was to have CFFormat store the date/time of the last time it ran the format for a given path and compare the time stamps on the files and skip files that haven't been touched since the last time I ran the format. At least that way, if I open a repo, edit 2 files, and to go run my format before I commit, I don;t need to wait for 2 files to be formatted-- not hundreds.

bdw429s commented 3 years ago

Another question is if you are using threading to process the files. in Codechecker CLI for example, I gather an array of files to process and then use the parallel array.each() to process them and it's much faster than doing one file at a time.

jcberquist commented 3 years ago

This is really interesting and disappointing. I am sorry nothing jumped out at you when profiling - I can run cfformat over large directories here and and it only takes it a few seconds - I have a 272 file directory structure and it runs over the whole thing in ~13 seconds. Overall there is a fair amount of file IO going on, and I wonder if that is slowing everything down on your system? My local system has a fairly fast drive, so my test might not be representative of what you are experiencing.

I would love to know if it is down to overall lines of code, or if there are some particular file(s) giving it trouble. If you could run cfformat on subdirectories (if that is possible) and use the --timeit flag and see if there are particular files giving it trouble, it would be great to know that.

I have tried using parallelization in the past, but haven't found it to improve performance enough to warrant it, and it seems to garble the job output in CommandBox as well. The overhead of starting the executable over and over outweighs a lot of the gains in my testing, and if I just parallelize the CFML portion of the format run, it doesn't seem to improve the time all that much locally for me. That said, if you want to try a quick test, you could parallelize this arrayEach() call on your local instance: models/CFFormat.cfc#L131-L165.

I really would like to avoid taking ~750ms on an average file, so it would be nice if this could be sorted out, but if nothing else, I will look into your suggestion of only processing recently edited files.

jcberquist / commandbox-cfformat

Optimze format time #95