smontanari / code-forensics

A toolset for code analysis and report visualisation
384 stars 45 forks source link

Error: spawn git EMFILE #20

Closed Husterknupp closed 5 years ago

Husterknupp commented 6 years ago

Hey @smontanari , thanks for providing code-forensics.

I want to perform some analysis on my js repo which is built using npm and gulp. Unfortunately I run repeatedly into this EMFILE error. I guess it happens because more files are being scanned than my operating system can handle. That's why I increased that limit to 10000 via $ ulimit -n 10000 which did not solve the problem though.

Am I doing something wrong here?


$ gulp hotspot-analysis --dateFrom=2017-12-18 --dateTo=2017-12-18

events.js:163
      throw er; // Unhandled 'error' event
      ^

Error: spawn git EMFILE
    at exports._errnoException (util.js:1050:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:193:32)
    at onErrorNT (internal/child_process.js:367:16)
    at _combinedTickCallback (internal/process/next_tick.js:80:11)
    at process._tickCallback (internal/process/next_tick.js:104:9)
    at Module.runMain (module.js:607:11)
    at run (bootstrap_node.js:423:7)
    at startup (bootstrap_node.js:147:9)
    at bootstrap_node.js:538:3

OSX 10.12.6 npm 4.2.0 node 7.9.0

TrentonZero commented 6 years ago

I had this problem in a monolithic repo at work that had vast amounts of business process cruft in it and I was able to work around it by updating my gulp file to target only the actual source directories with 'includePaths'. The underlying problem should still be fixed, but that let me get on with my analysis.

Husterknupp commented 6 years ago

@TrentonZero the folder structure of my app looks something like this

$ tree -L 1
.
├── Gulpfile.js
├── README.md
├── backend (backend java code)
├── bower.json
├── client (frontend javascript/html/css code)
├── node_modules
├── package.json
├── pom.xml
├── target

the error message is thrown when the node_modules folder is analyzed. Which is not even necessary to be analyzed.

when I change the gulpfile accordingly, I don't have this error, true. But then I have no git information anymore. I guess, because it's not the git root?

repository: {
      rootPath: "client",
    }

Appreciate any forwarding 🙃

jdevoo commented 6 years ago

You can have a more fine-grained gulpfile which excludes some and keeps some. Below is an example on clipper.ai

// clipper.ai
require('code-forensics').configure({
    repository: {
        rootPath: 'repos/clipper',
        excludePaths: [
            'bench',
            'bin',
            'examples',
            'images',
            'src/benchmarks',
            'integration-tests/data'
        ],
        includePaths: [
            '**/+(*.java|*.scala|*.xml|*.cpp|*.hpp|*.py)'
        ],
    },
    layerGroups: {
        'test': [
            { name: 'containers', paths: ['containers/test'] },
            { name: 'integration', paths: ['integration-tests'] }
        ],
        'containers': [
            { name: 'common', paths: ['src/container'] },
            { name: 'R', paths: ['containers/R'] },
            { name: 'Java', paths: ['containers/jvm'] },
            { name: 'Docker', paths: ['dockerfiles'] }
        ],
        'main': [
            { name: 'frontend', paths: ['src/frontends', 'src/management']},
            { name: 'libclipper', paths: ['src/libclipper'] },
            { name: 'libs', paths: ['src/libs'] }
        ]
    },
    contributors: [
        ['David Crankshaw', 'dcrankshaw'], ['Corey-Zumar'], ['Nishad Singh', 'nishadsingh1'], ['Giulio Zhou', 'giulio-zhou'], ['Lucas Moura', 'lucasmoura'], ['Santosh Addanki', 'santi81'], ['Simon Mo', 'simon-mo'], ['Patrick Rodriguez', 'stratospark'], ['Vinay M', 'rmdort'], ['Abhishek Dubey', 'dubeyabhi07']
    ]
})
smontanari commented 6 years ago

As @jdevoo points out, there are many options you can use to better filter the files that you're interested in analysing. However, as @TrentonZero says, it would be good to find a solution for this problem at the root rather than trying to work around it all the time. Can anyone point me to a public repo where this issue is replicable so I can do some investigation? There are nodejs libraries that can help limiting the number of open descriptors, but before rushing to implement a patch I'd like to understand better which part of the analysis process is the bottleneck. I also welcome any help if you have some time to spare.

Also @Husterknupp could you try and run your analysis with the option COMMAND_DEBUG=1 and attach a zipped file of the entire log?

smontanari commented 5 years ago

I'm doing some housekeeping of old/stale issues. @Husterknupp would you be able to assert whether this is still a problem? A recent version of code-forensics has improved the limitation of the number of parallel external commands/processes executing at any given time, so it may have helped in your case.

Husterknupp commented 5 years ago

@smontanari I just retried and the hotspot-analysis worked without any problem. Thanks for coming back to this 👍