Closed spock closed 9 months ago
I can try putting
# calc the new hashes
index.update()
# compare
index.check_fix(self.context.force)
under
# save if update is set
if self.update:
but I need advice on how to install locally, pip install --user .
from the repository folder no longer works for me (it creates chkbit
and cli
as two separate modules and then imports fail).
I was able to move cli
under chkbit
(using relative imports) and thus install from the repo. Let me know if I should make a pull request out of that import reshuffling. It mostly uses from . import ...
, and in one file in cli
it uses from .. import ...
.
Applying the "fix" from my comment above results in no new files being reported at all, but yes it becomes exactly as fast as it was before (32s with 6 workers or 1m 52s with 1 worker).
I now wonder if previous version also wasn't reporting new files, if there were no .chkbit index files?
I've added update
argument to index.update()
, so now it can skip checking mtime
and calculating hashes for new files. And made a few more small changes. Now with 6 workers I get expected output, very fast:
Processed 48409 files in readonly mode.
No changes were made (specify -u to update):
- 48409 files would have been added and
- 0 files would have been updated.
real 0m33.298s
I think I have it now, but haven't yet figured out if I need to somehow fake context.hit
without calculating a hash.
For now I'm just increasing cfiles
count.
Let me know if I should make this into a PR; it is independent of the "imports reshuffling" commit, and is rather simple.
For now I've been editing and committing directly in a cloned repository, without even making a new branch 🤦🏻♂️
I hope there is still an easy way to somehow cherry-pick these two commits as PRs to your repository.
Sorry for discovering this bug and thank you for looking into it ;) I wanted to simplify the cli options and my test cases did not include this situation.
I'm curious about the use case though. Are you running it on a backup medium that's not fully indexed?
About installing and running chkbit from source - I'll add it to the readme.
My use-case is a "live archive", a photo collection where I'm sometimes reorganizing, editing, and adding new photos (normally in batches of dozens to hundreds, not more).
Under normal usage conditions "verifying unindexed location" shouldn't be an issue, of course; but it's also very nice if software is as fast as possible - when possible :)
I'm planning to run chkbit
just before auto-copying this collection to a secondary encrypted copy.
As one option, I'm considering to check chkbit
exit status to decide if backing up can proceed or should be halted for investigation.
OK, in any case, thanks for discovering and fixing it! I pushed the new version to pypi.
Previously, running
--verify-index
on my 50k files - without any existing index - was taking 1m 50s with 1 worker, and 35s with 6 workers. It was simply listing all the files, and that was very fast.Now, with version 3, verification takes about 24m with a single worker, and about 6m 40s with 6 workers; which are about the same times as for updating the index.
Looking at line 39 here https://github.com/laktak/chkbit-py/commit/a03d5b4267b1e8de23b499ea317960bd907424f1#diff-0c955c1fb53488dd1d4f7f30bec88d9339a66af65b566f0fa1375d7dc926b22bR39 , it looks like even in the read-only verification mode hashes are calculated and modified times are checked for all new files.
I think in the previous version new files during verification were simply reported: https://github.com/laktak/chkbit-py/commit/a03d5b4267b1e8de23b499ea317960bd907424f1#diff-5b1e4f1fcc838580d1fd73bbf4d2b6a8ad87d3a526fd32c45d9743ddd84ae7aaL43 (doesn't expand from the URL - indexthread.py, line 43)