openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.6k stars 1.75k forks source link

Add Path To Shorten ZFS Diff Output #13200

Open JavaScriptDude opened 2 years ago

JavaScriptDude commented 2 years ago

I have found uses for zfs diff in my regular development workflows and lean on it heavily. However I do hit situations where it returns many changes that I don't care about in my use cases. This feature requests will allow the user to pass a path parameter to zfs diff.

If a path parameter is provided, the directory and any children should be included in output. If a file is given, only the file should be included in the output. If the path provided does not exists in the left or the right of the diff, then a clear error should be returned to identify the condition clearly; eg: Path provided not found in left or right sides of diff

This feature would give ZFS some more VCS (Version Control System) style features that will open up more possible use cases for this excellent system.

Ideally update the path parameter processing into diff engine to reduce the process time rather than just be a filter on the output.

rincebrain commented 2 years ago

This would be problematic until the caveat where zfs diff sometimes can't figure out what the filename of an object in a vacuum is is resolved, for example. (This can arise when you have multiple hardlinks to a file, because there's only one name field, and if that copy of the file gets deleted, then the object itself doesn't know any more, only the forward references from each hardlink do...)

JavaScriptDude commented 2 years ago

@rincebrain - Is there an open ticket for that issue?

rincebrain commented 2 years ago

6335 maybe? There was a talk by a company at the dev summit who fixed most of these in their implementation internally, but they haven't open sourced it (yet?).

JavaScriptDude commented 2 years ago

Thanks @rincebrain

I have been using zfs as a secondary version control system for my development and have written a multi-purpose python tool for this purpose called zfsvc based on my zfslib library. Its internal only but I will release it eventually. This gives me much more granular vc history's, which I use daily for timekeeping and auditing.

Having more discrete filtering of diffs from zvs diff would greatly improve such tools and workflows.

FYI - Here is an example of zfsvc: zfsvc diff --discrete -D 8H -p /dpool/vcmain/dev/py Output:

Dataset: dpool/vcmain (/dpool/vcmain)
     From: 2022-03-17 13:04:48 (as_22-03-17_17:04:47_hr)
       To: 2022-03-17 14:15:03 (as_22-03-17_18:15:03_freq)
-------------------------------------------------------------------------------------------------------------------------------------------------------
      date       |                snapshot     | ? |           file           |                 rpath                       | l_add | l_rem
2022-03-17 13:15 | as_22-03-17_17:15:02_freq   | M | pymssql_test.py          | /dev/py/mycorp/mycorp_sql_stuff             |     4 |     2
2022-03-17 13:30 | as_22-03-17_17:30:28_freq   | M | launch.json              | /dev/py/mycorp/mycorp_sql_stuff/.vscode     |     4 |     2
2022-03-17 13:30 | as_22-03-17_17:30:28_freq   | M | pymssql_test.py          | /dev/py/mycorp/mycorp_sql_stuff             |    17 |     7
2022-03-17 13:45 | as_22-03-17_17:45:12_freq   | M | launch.json              | /dev/py/mycorp/mycorp_sql_stuff/.vscode     |     8 |     4
2022-03-17 13:45 | as_22-03-17_17:45:12_freq   | M | pymssql_test.py          | /dev/py/mycorp/mycorp_sql_stuff             |     8 |     3
2022-03-17 14:00 | as_22-03-17_18:00:03_hr     | M | launch.json              | /dev/py/mycorp/mycorp_sql_stuff/.vscode     |     0 |     0
2022-03-17 14:00 | as_22-03-17_18:00:03_hr     | + | launch.json              | /dev/py/db/pymssql_tester/.vscode           |     - |     -
2022-03-17 14:00 | as_22-03-17_18:00:03_hr     | + | pymssql_tester.py        | /dev/py/db/pymssql_tester                   |     - |     -
2022-03-17 14:15 | as_22-03-17_18:15:03_freq   | M | pymssql_tester.py        | /dev/py/db/pymssql_tester                   |    39 |     8
2022-03-17 14:15 | as_22-03-17_18:15:03_freq   | + | qcorelite.cpython-37.pyc | /dev/py/db/pymssql_tester/__pycache__       |     - |     -
-------------------------------------------------------------------------------------------------------------------------------------------------------
rincebrain commented 2 years ago

You might find using the "punt it to userland and make userland calculate the diff" output from #12837 useful, since you care specifically about the objects changed.

JavaScriptDude commented 2 years ago

Sounds interesting. Have you seen a practical example of how this type of userland diff would be done? I've never had to use zfs send or recv so far and #12837 is not super explicit on the technique.

Thanks again for the info!

rincebrain commented 2 years ago

Beyond that PR itself, no, I don't.

On Thu, Mar 17, 2022 at 10:40 PM JavaScriptDude @.***> wrote:

Sounds interesting. Have you seen a practical example of how this type of userland diff would be done? I've never had to use zfs send or recv so far and #12837 https://github.com/openzfs/zfs/pull/12837 is not super explicit on the technique.

Thanks again for the info!

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/13200#issuecomment-1071970010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7IZMHCTJBHLQJZUZEDVAPUIPANCNFSM5QRLNHMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

almereyda commented 2 years ago

@JavaScriptDude Can you comment on the progress of your organisation's internal discussions about open sourcing zfsvc?

JavaScriptDude commented 2 years ago

I use it daily and its my personal code base with no org to worry about. I will look into putting it on Github in the future. The code depends on my internal 'core' python library that I will need to strip down into a lighter one as its pretty huge.

almereyda commented 2 years ago

Thank you for the insights, and looking forward to the release.