pixelb / fslint

Linux file system lint checker/cleaner
319 stars 72 forks source link

FSlint with database support #152

Open dipietro-salvatore opened 6 years ago

dipietro-salvatore commented 6 years ago

Fslint with SQLite support to avoid to hash twice the same file during the findup process.

MarcinOrlowski commented 5 years ago

Do you have any metrics how much this would speed things up?

dipietro-salvatore commented 5 years ago

Well, In my case it made the difference. I wanted to find all the duplicated file on an entire HDD. This allowed me to:

I found it very useful and save me a lot of time. It does not impact in any way the program performance but it reduces the time necessary to re-scan the same folder/disk in the future.

perfect7gentleman commented 4 years ago
Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 36, in <module>
    os.makedirs(directory)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/usr/share/fslint/fslint/../databases'

It should be used under root only?

dipietro-salvatore commented 4 years ago

Hi @perfect7gentleman, I just updated the git to fix the problem. Please, re-clone the git repo. Here some instructions to run in the terminal:

cd /usr/share/
sudo mv fslint fslint-orig
sudo git clone https://github.com/dipietro-salvatore/fslint.git fslint

Let me know if this fix the issue. Thanks

perfect7gentleman commented 4 years ago

Permission fixed, But

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 48, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables

also it cannot find any dupes at all, in other words it doesn't work

dipietro-salvatore commented 4 years ago

can you please provide more information about the system and error? I am not able to reproduce the error.

perfect7gentleman commented 4 years ago

System - Gentoo. Error is gone. But it doesn't find dupes anyway.

perfect7gentleman commented 4 years ago

Nope. It's back.

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/database", line 48, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables
perfect7gentleman commented 4 years ago

What info is needed?

dipietro-salvatore commented 4 years ago

I did some changes to the repo. Can you please to clone it and run it again?

perfect7gentleman commented 4 years ago

The same error

Traceback (most recent call last):
  File "/home/MZ7WD240HAFV/Temporary/fslint/fslint/supprt/database", line 49, in <module>
    cursor.execute('''SELECT * FROM files WHERE path IN ({seq})'''.format(seq=','.join(['?']*len(files_list))), ([f for f in files_list]))
sqlite3.OperationalError: too many SQL variables
dipietro-salvatore commented 4 years ago

To be able to replicate the error, Can you please tell me how many files to you have on the scanning folder(s)?

perfect7gentleman commented 4 years ago
129.5 MiB (135,783,482)
8,943 files, 1,870 sub-folders
dipietro-salvatore commented 4 years ago

I changed the way how the script retrieve the data from the DB. Can you please try now?

perfect7gentleman commented 4 years ago

Now it works.

54.1MB wasted in 2547 files (in 993 groups)
dipietro-salvatore commented 4 years ago

Happy to ear that! Out of curiosity, do you know how much time it takes the first time compared to the second time using the DB (if you haven't deleted the files immediately)?

perfect7gentleman commented 4 years ago

rather fast