skohub-io / skohub-vocabs

A lightweight tool to publish SKOS Vocabularies
https://skohub.io/
Apache License 2.0
34 stars 25 forks source link

gatsby-source-filesystem fails with "EMFILE: too many open files" #104

Closed sroertgen closed 3 years ago

sroertgen commented 3 years ago

When working with larger files (e.g. https://raw.githubusercontent.com/sroertgen/oeh-framework-bb/master/data/curriculum_bb_competences_skos.ttl) skohub-vocabs failed with the error: EMFILE: too many open files.

This gatsby issue seems to be related: https://github.com/gatsbyjs/gatsby/issues/12011

I did a quick test installing graceful-fs and implementing the following like mentioned above in gatsby-node.js and webHookServer.js:

const gracefulFs = require('graceful-fs');
gracefulFs.gracefulify(fs);

Before making a pull request I would like to ask if you get the same error or it might be something special to my machine.

acka47 commented 3 years ago

I assume this has something to do with https://github.com/skohub-io/skohub-webhook/issues/6? When I move the vocabs into the root folder it works. See https://skohub.io/acka47/oeh-framework-bb/heads/master/opencurricula/berlin-brandenburg/competences/index.html respectively https://github.com/acka47/oeh-framework-bb

acka47 commented 3 years ago

Did you maybe resolve skohub-io/skohub-webhook#6 with graceful-fs? Sounds like it to me. Then I propose you open a PR for this.

sroertgen commented 3 years ago

ah, sorry, I should have searched the issues before opening this... I will prepare a PR

sroertgen commented 3 years ago

unfortunately I just noticed that it fixes the build of https://raw.githubusercontent.com/sroertgen/oeh-framework-bb/master/data/curriculum_bb_competences_skos.ttl (about 10k vocabs), but it breaks with a even bigger file with 45.960 vocabs (e.g. https://raw.githubusercontent.com/openeduhub/oeh-metadata-eaf-schlagwortverzeichnis/main/data/eaf-graph-by-subject-all.ttl

success building schema - 1.715s
Built index {
  id: 0,
  items: 45960,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 0,
  depth: 0,
  resolution: 9,
  contextual: 0
}
success createPages - 107.064s

 ERROR 

glob error [Error: EMFILE: too many open files, scandir '.../skohub-vocabs/node_modules/gatsby/dist/internal-plugins/prod-404/src/pages'] {
  errno: -24,
  code: 'EMFILE',
  syscall: 'scandir',
  path: '.../skohub-vocabs/node_modules/gatsby/dist/internal-plugins/prod-404/src/pages'
}

 ERROR 

UNHANDLED REJECTION EMFILE: too many open files, scandir '.../skohub-vocabs/node_modules/gatsby/dist/internal-plugins/prod-404/src/pages'

  Error: EMFILE: too many open files, scandir '.../skohub-vocabs  /node_modules/gatsby/dist/internal-plugins/prod-404/src/pages'

not finished createPagesStatefully - 72.808s

Though it just breaks running npm run build. It works when running develop mode with npm start.

literarymachine commented 3 years ago

Though it just breaks running npm run build. It works when running develop mode with npm start.

Does not work for me in both cases with eaf-graph-by-subject-all.ttl, works with curriculum_bb_competences_skos.ttl. I am guessing this is related to the amount of files being generated, not read, so I am not sure that this is a problem with gatsby-source-filesystem.

Anyways, using graceful-fs it works, although it takes a while:

Done building in 1594.909334812 sec

dr0i commented 3 years ago

Could also be at OS level. On linux check allowed open files with ulimit -n and increase it doing e.g. ulimit -n 10240 (or more, my machine allows >1M, depends on machine (RAM). npm run build works fine on that machine using eaf-graph-by-subject-all.ttl in 680 sec.) See https://stackoverflow.com/questions/8965606/node-and-error-emfile-too-many-open-files.

sroertgen commented 3 years ago

That did the trick for me, thank you! Done building in 2842.377295418 sec

dr0i commented 3 years ago

We should consider to note this in the README.