yuanchuan / node-watch

A wrapper and enhancements for fs.watch
https://npm.im/node-watch
MIT License
339 stars 44 forks source link

Is there a way, or a best practice to scan this large amount of files for processing ? #131

Closed NguyenHoangMinhkkkk closed 1 month ago

NguyenHoangMinhkkkk commented 2 months ago

This is my code for getting all files existed in folder on the initial run of my application. it's works good.

but there is a problem with Large file amount. if the folder contains 1M or 2M files or more, -> performance goes down, RAM consuming because this.proccessFile function cannot keep these many paths for processing

is there a way, or a best practice to scan this large amount of files for processing ?

this.proccessFile is a reader function to read content of files.


import { glob } from 'glob';
import watch from 'node-watch';
    watcher.on('ready', async () => {
      this.logger.log('[Watch] Ready, run initial scan');

      const patterns = [
        `${this.FOLDER}/**/*.txt`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.csv`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.TXT`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.CSV`.replace(/\\/g, '/'),
      ];

      const stream = glob.stream(patterns);

      let fileCount = 0; // counting files

      stream.on('data', (filePath) => { 

        fileCount = fileCount + 1;
        this.logger.log('[Watch] scan: ' + fileCount);

        this.processFile(filePath);
      });
    });
yuanchuan commented 1 month ago

Hi @NguyenHoangMinhkkkk ,

Why do you need to scan all the files in the first place? Did it work by just watching the directory instead?

The ready event emits after the watcher has been setup. If you're trying to traverse each file and it should be slow as the amount of the files grow. But it seems that it has nothing to do with the watcher.

NguyenHoangMinhkkkk commented 1 month ago

Thank you!

my work is watch the storage-folder to see which file added into it. but if the App stopped unfortunately, files added into folder stacking, the long time stopped, the more files amount. and i have to find a solution to handle these stacked files in the storage-folder when the App started again.

normally there are files added into the folder ~20/s mean 1.728.000/day.

yuanchuan commented 1 month ago

An in-memory database might help?

On Sun, May 26, 2024 at 2:07 PM Minhnh @.***> wrote:

Thank you!

my work is watch the storage-folder to see which file added into it. but if the App stopped unfortunately, files added into folder stacking, the long time stopped, the more files amount. and i have to find a solution to handle these stacked files in the storage-folder when the App started again.

normally there are files added into the folder ~20/s mean 1.728.000/day.

— Reply to this email directly, view it on GitHub https://github.com/yuanchuan/node-watch/issues/131#issuecomment-2132079415, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5EOXF2GJ3EMM5GGBWOFDZEF33JAVCNFSM6AAAAABHEZJ2SOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSGA3TSNBRGU . You are receiving this because you commented.Message ID: @.***>

NguyenHoangMinhkkkk commented 1 month ago

An in-memory database might help?

i'm doing a workaround solution. just split these stacked files into smaller blocks and sync handling them.