lavinir / hassio-onedrive-backup

MIT License
250 stars 7 forks source link

File Sync not working #171

Open andyp05 opened 7 months ago

andyp05 commented 7 months ago

Daily backups are working fine. File Sync is not picking up on new files. If I restart the addon, the files upload to onedrive without error.

config:

local_backup_num_to_keep: 2
onedrive_backup_num_to_keep: 10
backup_interval_days: 1
backup_name: ha_backup
monitor_all_local_backups: false
ignore_upgrade_backups: false
hass_api_timeout_minutes: 30
notify_on_error: true
exclude_media_folder: true
exclude_ssl_folder: false
exclude_share_folder: false
exclude_local_addons_folder: false
sync_paths:
  - /media/**/*.mp4
  - /media/**/*.png
  - /media/**/*.jpg
file_sync_remove_deleted: true
excluded_addons: []
log_level: info
recovery_mode: false

Upon restart, the any new file is uploaded.

02/06/2024 20:48:45 [1] Info: Backup interval configured to every 24 hours
02/06/2024 20:48:45 [1] Info: Generational backups enabled
02/06/2024 20:48:46 [7] Info: File /media/dr_06 2047.mp4 out of sync. Starting Upload
02/06/2024 20:48:46 [7] Info: Starting file upload. (Size:1232615 bytes. Attempt: 1/3)
02/06/2024 20:48:48 [13] Info: Upload completed successfully

After restart, I created a new file in the same subdirectory (/media) as the file in the log. This file is called dr_06 2048.mp4 That file has not been uploaded and I waited over 10 minutes. The refreshed log has not changed.

Is the File Sync on a polling loop of > 10 minutes? Is this a configuration issue? Or something else?

Core 2024.1.6
Supervisor 2024.01.1
Operating System 11.4
Frontend 20240104.0
Addon: OneDrive Backup 2.2.4 

Thanks for your help.

andyp05 commented 7 months ago

New info after I put logging to Verbose.

I noticed that ion restart, the addon evaluates the 2400 files in the media folder on startup. A small section of the log:

02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/backyard-1706898910.82805-l8176v.jpg
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/backyard-1706898910.82805-l8176v.jpg up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/garage-1707253635.057636-auwkwo.jpg
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/garage-1707253635.057636-auwkwo.jpg up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/garage-1707242254.514187-o02pkv-clean.png
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/garage-1707242254.514187-o02pkv-clean.png up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/backyard-1707157516.958189-ontf63-clean.png
02/06/2024 21:18:38 [28] Verbose: File: /media/frigate/clips/backyard-1707157516.958189-ontf63-clean.png up to date. No sync required
02/06/2024 21:18:38 [28] Verbose: Evaluating file: /media/frigate/clips/garage-1706392090.910456-smaa8b.jpg

When it is done evaluating all files, it evaluates all backups. This process takes 20 minutes. When evaluating is done, then it looks for new files. It uploads the new files without error. The process of evaluating then immediately restarts from the beginning. Stopping file sync for another 20 minutes.

Thanks again.

lavinir commented 7 months ago

I'm not sure I understand your comment about evaluating the Backup files. The File sync runs independently from the backups (unless you somehow have your backups stored in one of the File Sync directories). Could you share the relevant logs showing what you mean by evaluating the backup files, and something that takes 20 minutes?

andyp05 commented 7 months ago

The once daily backup in the morning works fine. No issues there.

The File Sync that runs 24 hours a day since there are no excluded times is the issue.

The File Sync function runs through evaluating files continuously and it takes 20 minutes. A small section of the logs is below. It goes on for hundreds of lines.

02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/backyard-1706898910.82805-l8176v.jpg
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/backyard-1706898910.82805-l8176v.jpg up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/garage-1707253635.057636-auwkwo.jpg
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/garage-1707253635.057636-auwkwo.jpg up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/garage-1707242254.514187-o02pkv-clean.png
02/06/2024 21:18:37 [24] Verbose: File: /media/frigate/clips/garage-1707242254.514187-o02pkv-clean.png up to date. No sync required
02/06/2024 21:18:37 [24] Verbose: Evaluating file: /media/frigate/clips/backyard-1707157516.958189-ontf63-clean.png
02/06/2024 21:18:38 [28] Verbose: File: /media/frigate/clips/backyard-1707157516.958189-ontf63-clean.png up to date. No sync required
02/06/2024 21:18:38 [28] Verbose: Evaluating file: /media/frigate/clips/garage-1706392090.910456-smaa8b.jpg

Any new file in the /media folder is not seen or uploaded during this 'Evaluating' phase. The uploads are delayed until the evaluation is completed. Once completed, the new files are uploaded. Then the process repeats.

  1. Evaluate all files to check sync status. Takes 20 minutes.
  2. Upload any new files.
  3. Go to step 1

Hope that makes sense.

andyp05 commented 7 months ago

I caught the logs when the loop starts again. Hope this helps.

02/07/2024 11:14:47 [20] Verbose: File: /media/frigate/recordings/2024-01-27/21/garage/28.19.mp4 up to date. No sync required
02/07/2024 11:14:47 [20] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/garage/21.09.mp4
02/07/2024 11:14:47 [20] Verbose: File: /media/frigate/recordings/2024-01-27/21/garage/21.09.mp4 up to date. No sync required
02/07/2024 11:14:47 [20] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/garage/00.09.mp4
02/07/2024 11:14:47 [20] Verbose: File: /media/frigate/recordings/2024-01-27/21/garage/00.09.mp4 up to date. No sync required
02/07/2024 11:14:47 [20] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/09.09.mp4
02/07/2024 11:14:48 [20] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/09.09.mp4 up to date. No sync required
02/07/2024 11:14:48 [20] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/09.19.mp4
02/07/2024 11:14:48 [20] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/09.19.mp4 up to date. No sync required
02/07/2024 11:14:48 [20] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/01.20.mp4
02/07/2024 11:14:48 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/01.20.mp4 up to date. No sync required
02/07/2024 11:14:48 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/00.31.mp4
02/07/2024 11:14:48 [20] Verbose: Checking backups
02/07/2024 11:14:48 [20] Verbose: Refreshing existing backups...
02/07/2024 11:14:48 [20] Verbose: Backup Interval Completed.
02/07/2024 11:14:49 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/00.31.mp4 up to date. No sync required
02/07/2024 11:14:49 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/00.41.mp4
02/07/2024 11:14:49 [11] Verbose: Last local backup Date: 02/07/2024 10:29:49
02/07/2024 11:14:49 [11] Verbose: Last online backup Date: 02/07/2024 10:29:49
02/07/2024 11:14:49 [11] Verbose: Online backups synced. No upload required
02/07/2024 11:14:49 [11] Verbose: Evaluating Online Generational Backups
02/07/2024 11:14:49 [11] Verbose: Found 10 OneDrive backups that can be removed (Generational Rules)
02/07/2024 11:14:49 [11] Verbose: Potential OneDrive backups for removal: a3e805af (02/06/2024 18:49:48),cf33f8b4 (02/07/2024 10:29:49),017a7fe7 (01/31/2024 09:54:10),5fdb8131 (01/27/2024 14:09:56),2ea60b5c (02/01/2024 09:56:26),d7492cb2 (02/02/2024 10:00:00),b28d2106 (02/03/2024 10:01:04),ed96650c (02/04/2024 10:02:30),d7e478a2 (02/05/2024 10:03:16),54cf258f (02/06/2024 10:04:14)
02/07/2024 11:14:49 [11] Verbose: Evaluating Local Generational Backups
02/07/2024 11:14:49 [11] Verbose: Found 5 local backups that can be removed (Generational Rules)
02/07/2024 11:14:49 [11] Verbose: Potential local backups for removal: a3e805af (02/06/2024 18:49:48),cf33f8b4 (02/07/2024 10:29:49),d7e478a2 (02/05/2024 10:03:16),b28d2106 (02/03/2024 10:01:04),54cf258f (02/06/2024 10:04:14)
02/07/2024 11:14:50 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/00.41.mp4 up to date. No sync required
02/07/2024 11:14:50 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/01.30.mp4
02/07/2024 11:14:50 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/01.30.mp4 up to date. No sync required
02/07/2024 11:14:50 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/01.40.mp4
02/07/2024 11:14:51 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/01.40.mp4 up to date. No sync required
02/07/2024 11:14:51 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/08.39.mp4
02/07/2024 11:14:51 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/08.39.mp4 up to date. No sync required
02/07/2024 11:14:51 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/01.01.mp4
02/07/2024 11:14:52 [11] Verbose: File: /media/frigate/recordings/2024-01-27/21/doorbell/01.01.mp4 up to date. No sync required
02/07/2024 11:14:52 [11] Verbose: Evaluating file: /media/frigate/recordings/2024-01-27/21/doorbell/21.19.mp4
lavinir commented 7 months ago

So I want to make sure I understand - From the logs I don't see anything misbehaving. Other that the sheer amount of time it takes to complete a File Sync cycle (I admit I did not consider / test how long this takes with thousands of files). While a cycle is in progress it will not pick up any new files. So your issue comes down to how long it effectively picks up changes and syncs them because how long it takes to go over all the files each time yes ?

andyp05 commented 7 months ago

I think that is the issue. The process to go over all the files takes over 20 minutes. Is there a way to look at modified dates?

Thanks

lavinir commented 7 months ago

Yes but the only issue there is it assumes the files aren't manipulated on the OneDrive side. So for example if a file was changed / deleted in OneDrive and I'm skipping it locally because it wasn't modified since the last sync it will not be re-uploaded.

I would need to think about the best way to approach this. I can also parallelize this check to improve performance.

andyp05 commented 7 months ago

Could you check the last modified dates on both sides. I can't see a use case where a file would be modified (and the date stayed the same) on the Onedrive side. Save a lot of processing in not calculating hashes. I wish I could help propose code, but I haven't programmed in C# for 15 years.

lavinir commented 7 months ago

No.. modified date of the file when uploaded to OneDrive will be the timestamp it was uploaded not the original modified date of the file in Home Assistant.. I'm also not sure most of the latency is coming from the hash calculation, likely more so the IO for querying the file on OneDrive.

andyp05 commented 7 months ago

Maybe I am completely wrong, but I can't picture a case where someone would modify a backup file on onedrive. Even if they did, would they want it overridden during the next sync.

Could there be a setting to not check the hash and just check for the file's existence on onedrive? For security, I want the file to leave my house as quick as possible.

Thanks for your time and the program.

andyp05 commented 7 months ago

Just a thought. If a new setting CheckHash is available, could a minor change to SyncFile() be made from:

            DriveItem? remoteFile = await _graphHelper.GetItemInAppFolderAsync(remotePath);                             
            bool requiresUpload = 
                remoteFile == null 
                || remoteFile.Size != fileInfo.Length 
                || remoteFile.File.Hashes.Sha256Hash.Equals(fileHash, StringComparison.OrdinalIgnoreCase) == false;

to:

            DriveItem? remoteFile = await _graphHelper.GetItemInAppFolderAsync(remotePath);                             
            bool requiresUpload = 
                remoteFile == null 
                || remoteFile.Size != fileInfo.Length 
                || (checkHash && remoteFile.File.Hashes.Sha256Hash.Equals(fileHash, StringComparison.OrdinalIgnoreCase) == false);

The reason this is a big issue for me is that I am using Frigate and it is generating many mp4 files from multiple cameras all day and it runs into GBs per day.

Thanks for all your help.

lavinir commented 7 months ago

Hi,

Appreciate the input. Few thoughts:

  1. Regarding your comment on someone manually changing the 'backup' file - I agree this is not likely but we are talking about the File Sync not the backup files. I have no idea what those files could be or what they are used for so I can't assume no one is modifying them or deleting them in OneDrive.
  2. The 'checkhash' suggestion - Do you mean to expose a configuration that allows you to bypass hash checks ? I don't really like that option as it seems a weird setting to expose, it's a low level implementation detail which can cause sync issues.
  3. I do understand your use case though and the performance limitations. I will add a task to investigate this to make sure the hashes are indeed the bottleneck (still not convinced that they are) and then see what the best way to improve the performance should be.
github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 30 days with no activity.