soyuka / dat-daemon

Dat as a daemon
MIT License
25 stars 3 forks source link

Cannot add many dat files #9

Open mitar opened 6 years ago

mitar commented 6 years ago

I get the following exception:

Error: EMFILE: too many open files, uv_interface_addresses
    at Object.networkInterfaces (os.js:126:30)
    at allInterfaces (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:163:21)
    at Timeout.that.update [as _onTimeout] (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:123:63)
    at ontimeout (timers.js:475:11)
    at tryOnTimeout (timers.js:310:5)
    at Timer.listOnTimeout (timers.js:270:5)
soyuka commented 6 years ago

Can you give me the steps to reproduce this issue?

mitar commented 6 years ago

I have a directory with 1000 dat repositories. I have run the following script:

#!/usr/bin/env python

import json
import os
import subprocess
import sys

for dirpath, dirnames, filenames in os.walk('.', followlinks=True):
    if 'dat.json' not in filenames:
        continue

    directory_path = os.path.abspath(dirpath)
    dat_json_path = os.path.join(directory_path, 'dat.json')

    print("Adding '{dat_json_path}'.".format(dat_json_path=dat_json_path))

    with open(dat_json_path, 'r') as dat_json_file:
        dat_json = json.load(dat_json_file)

    subprocess.run(['datdaemon', 'add', dat_json['url'], directory_path], encoding='utf8', stdout=sys.stdout, stderr=sys.stderr)
    sys.stdout.flush()
    sys.stderr.flush()

And after running few of those, I got this error. Dat files were just created, but no files were imported yet.

soyuka commented 6 years ago

Really interesting use case.

I won't have the time to dig into this right now but from a first look it looks like we're spamming the daemon client with dns requests leading to that issue.

For now, try to wait for the datdaemon add command to respond before issuing a new request. It may be a good feature to add some sort of "bulk insert" in the client though! Maybe in the future!

mitar commented 6 years ago

Yes, it seems if I am adding files few by few it works. (Does not die.)

I have some other issues, but I will open separate issues for that.

mitar commented 6 years ago

Even with one second in between adding it still dies. (And I am waiting for datdaemon add to finish as a process before adding another file.)

mitar commented 6 years ago

Now I added some slowly, but I cannot run datdaemond anymore. It dies when started with:

os.js:126
  const interfaceAddresses = getInterfaceAddresses();
                             ^

Error: EMFILE: too many open files, uv_interface_addresses
    at Object.networkInterfaces (os.js:126:30)
    at allInterfaces (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:163:21)
    at Timeout.that.update [as _onTimeout] (.../lib/node_modules/dat-daemon/node_modules/multicast-dns/index.js:123:63)
    at ontimeout (timers.js:475:11)
    at tryOnTimeout (timers.js:310:5)
    at Timer.listOnTimeout (timers.js:270:5)
mitar commented 6 years ago

I think the issue is simply that for every dataset, 8 files are opened (4 for metadata, 4 for content). And this piles up.

soyuka commented 6 years ago

I need to investigate this, will do when I have some spare time ;)

mitar commented 6 years ago

I was looking around a bit how to address this, maybe node cluster could help. This could also then use all the cores on the system. So you could increase both the CPU limit and also share that each worker opens a subset of files to not hit the limit.

soyuka commented 6 years ago

With only 2 dats the daemon was taking 1G memory on my server. Might be because of opened file descriptors, not sure. Anyway this definitely needs to be improved but I'm not sure it's fits my scope.