jotta / jotta-cli-issues

45 stars 1 forks source link

jotta-cli does not seem to properly enumerate the contents of a directory on a deduplicated volume (Windows Server) #76

Open vikingtiger opened 5 years ago

vikingtiger commented 5 years ago

Make sure you are running the latest version of jotta-cli before reporting an issue.

jotta-cli release (jotta-cli version): 0.5.14050

Description of problem: When backing up a directory from a deduplicated volume on Windows Server, it seems to find only a small portion of the data files when scanning the directory.

Expected: jotta-cli should see the contents of the file system like any other software would. The actual size of my "Install" directory is >220 GiB.

jotta-cli status (jotta-cli status):

------------------------------------------------------------------------------
 Account   : __snip__
 Usage     : 4.62TiB / ( Unlimited )
 Device    : __snip__
 Backups   :
------------------------------------------------------------------------------
   Path      : __snip__
   Files     : 2 files / 44.49MiB
   Status    : Up to date!

------------------------------------------------------------------------------
   Path      : E:/Install
   Files     : 66910 files / 342.93MiB
   Status    : Up to date!

------------------------------------------------------------------------------
   Path      : __snip__
   Files     : 11486 files / 1.55TiB
   Status    : Up to date!

------------------------------------------------------------------------------
OK

I have snipped away the paths of all backups but the one with issues. It's the only one where the source volume has deduplication enabled in Windows.

Relevant logs for the issue (cat ~/.jottad/jottabackup.log)

pid:2704 2019/04/09 07:22:40 Running scan of E:/Install
pid:2704 2019/04/09 07:23:28 Scan of E:/Install completed in 48.1802205s. Found 66910 files 342.93MiB 
pid:2704 2019/04/09 07:23:28 Scan completed of [E:/Install] completed in 48.1962237s [66910 files 342.93MiB]

Traceback

Additional info: No errors of any kind. Jotta-cli seems to think all is good. Yet it only scans and backs up a tiny fraction of the directory residing on the deduplicated volume. I have not observed any problems on other volumes which do not have deduplication enabled.

Kimbsen commented 5 years ago

That is very interesting. I have not used a deduplicated volume before so i'm not familiar with it, but i'm going to look into it.

Is this a network volume or local to the server?

If you enable logging of ignores jotta-cli config set logscanignores true does that produce anymore output in the log?

vikingtiger commented 5 years ago

Network volume or local: It's a local volume. NTFS partition on a local disk.

Logging of scan ignores has now been enabled. Will post back in a few hours with results.

vikingtiger commented 5 years ago

Well, as suspected, the log grew quite large. 😄 For each skipped entry it says not regular file, here's an excerpt:

pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/The.Sleuth.Kit/Autopsy/4.10.x/autopsy-4.10.0-64bit.msi not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/TestDisk/7.x/testdisk-7.0.win.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/TestDisk/7.x/testdisk-7.1-WIP.win.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/tesseract-3.05.01.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/tesseract-ocr-setup-3.02.02.exe not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/dictionaries/tesseract-ocr-3.02.dan.tar.gz not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/dictionaries/tesseract-ocr-3.02.eng.tar.gz not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/dictionaries/tesseract-ocr-3.02.nor.tar.gz not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tesseract/dictionaries/tesseract-ocr-3.02.swe.tar.gz not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Telegram/tsetup.0.9.56.exe not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tar/tar-1.13-1-bin.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tar/tar-1.13-1-dep.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tar/tar-1.13-1-doc.zip not regular file
pid:2704 2019/04/09 10:27:12 Ignored E:/Install/Software/#opensource/Tar/tar-1.13-1-src.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SubtitleEdit/3.5.6/SE356.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SubtitleEdit/3.5.4/SE354.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SubExtractor/SubExtractor1032d.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SubExtractor/sourceCode.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SubExtractor/subextractor_archive_codeplex.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SteghideUI/Steghide UI v3.0 - Source.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/SteghideUI/Steghide UI v3.0.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/Steghide/steghide-0.5.1-win32.zip not regular file
pid:2704 2019/04/09 10:27:13 Ignored E:/Install/Software/#opensource/Steghide/steghide-0.5.1.zip not regular file
Kimbsen commented 5 years ago

Thanks. That was somewhat as i expected. i'll have to do some testing to figure out whats going on and if we can fix it.

vikingtiger commented 5 years ago

Technically, the deduplicated files are NTFS reparse points (much like symlinks or Microsoft's "junction points"). It seems very likely that this is indeed the cause of the problem.

An overview of how Windows' deduplication works can be found here. I've used several dedup schemes over the years, on different platforms, and in my opinion Microsoft's post-processing approach has its merits.

I'd be really happy if jotta-cli will support files on Windows dedup volumes at some point in the future. In the mean time, I could use something like rclone.

I think you've done an awesome job creating jotta-cli. I really enjoy it. Just the fact that it has the webhook feature built in is fantastic. Keep up the good work!