Open Fantu opened 2 years ago
Thank you for creating the issue. So it's important to someone.
Yes, I haven't tried to offer blksnap in the upstream yet. The problem is that I can't find anyone who could do a code review. And I don 't have enough experience with upstream. In addition, the module is not a regular driver for some device. I suggest adding a filter to the block layer. Mantainers may not like it. It would be great if I could attract other developers of backup tools to work on the module. But I think that while the module is not in the upstream, no one will want to invest their time in this project. Therefore, I doubt the success of this venture.
But I intend to try to do it soon. It remains to double-check something. I hope that someone will give good advice. I will add a link to the patch discussion in upstream to this issue.
Any feedback is welcome.
Thanks for reply, I did/do small contributions on some open source projects but I never did on linux kernel, if I remember good I gave only an help in testing one xen part preparing to upstream many years ago. To know if can be acceptable your work and receive review of expert kernel developers (who can give you their opinions on it and their advices to improve where needed) post the patches is really important, from a fast look I found this doc that explain how to do it: https://www.kernel.org/doc/html/latest/process/submitting-patches.html (probably you already saw it) and from a fast look to your latest patches serie ( https://github.com/veeam/blksnap/commit/85bad8f010d08fb0d08fc2c33e5ca3630dfe2a82 ) seems you already did a good work on major requirements. I found this project in a search done after I found one blk-snap patches serie you posted 2 year ago so I think posting patches is also a way let know it to any other interested developers and users (as well as receiving upstream developers review)
seems you already did a good work on major requirements.
Thanks. Indeed, it is the good quality of the code that is an important component of success.
a way let know it to any other interested developers and users
Of course, that's why I'll be offering a patch.
I agree that your code represents a potentially important improvement for backup of linux systems, especially as new file systems proliferate. Blksnap is a distinct improvement over earlier such attempts like dattobd and elastio-snap. I have been lightly testing blksnap on a computer I routinely use where there are multiple file systems being used. Yours is a worthwhile project that will hopefully get merged into the mainstream kernel at some point.
Thank you @andymend . Thank you all for your support. I just sent a patch. If it passes the robot check, we will see it in the list soon.
for what I see here 2 kernel developers reviewed and replied and there is also 2 replied from an automatic test bot (reporting some warnings), for now, no one has objected or criticized the implementation method but only recommended improvements for a v2.
from a quick search I did not find other proposals for implementing "temporary" snapshots for better backup integrity and that supports many filesystems
unfortunately I have not yet had time to try blksnap but from what I have seen from experience this functionality integrated into the kernel is very useful and as I have already seen several backup solutions have made their own modules for similar things but no one who has proposed its implementation in the kernel
I have used several backup solutions over the years, most with file level backup using rsync, however even in such cases I think it would be useful because either the services stop during the whole backup or you risk not having it integrated and the larger the part to back up and the more time it takes, the greater the risk of integrity being compromised. For example, if you have to back up a server of hundreds of gb or over 1 tb (in most cases via the network the times increase), it takes hours and during the operations to numerous files in various interconnected parts and rsync buckups of the parts before such them, others during and others after would be problematic for its integrity on restore while with a snapshot and rsync backup on it you would have a backup with minimal integrity risk (reduced to the fraction of snapshot creation time FWIK) and restore you would know that you have a fairly healthy situation at the time of the snapshot and not a high risk of integrity problems for the time it took to make the backup. And without such a snapshot, the risks can only be mitigated by stopping the services (as mentioned) or doing it outside working hours.
Another thing I tried at works in latest years is to use btrfs with integrated snapshot for backups (I also tried zfs years ago) but I use only in some cases because for example on virtualization systems, the COW it has a significant impact on performance and there are also some space issues keeping snapshots on the same production filesystem. this could be a solution to make a backup with minimum downtime stopping services to take the snapshot (with blksnap) and maximum integrity probability, or at least making the snapshot to active services like what a COW filesystem like btrfs and zfs can do but without having to use such filesystems.
I hope these examples based on my experience give an idea of how useful a system like blksnap is When there is a calm period where I am not too busy I will start testing it in the meantime thanks again for your work SergeiShtepa
sorry for my bad english
Thanks @Fantu.
I expected a very calm reaction to the patch. I plan to:
So, the next patch will be better.
Alas, the performance of the blksnap module should not be better than DM snapshots (LVM2). When handling a write request, it is required to read a chunk of data, and then write this chunk of data to a new location. There are no magic here. When writing, the load on the drive increases threefold.
Theoretically, snapshots on Btrfs should be better. But for BTRFS, we cannot read the entire block device. We have to synchronize the source and target file systems. The Veeam Agent for Linux implements btrfs backup support by its native means. The performance was not the best.
The presence of a change tracker and the ease of allocating space for the difference storage are the main advantages of blksnap, compared to "classic" snapshots.
thanks for reply and your works about btrfs I know that is better where there is no "limit" or performance issue (for example I use on some backup storage and fileservers), but for example in "low cost" virtualization servers there was significant performance issue, disabling COW partially solves the issue because using snapshot will require it and ignore the previous options to disable cow, so still using ext4 and do temp. snapshot only on backup time, avoid the downtime or having it shortly, having high integrity probability and lower performance only when backup with snapshot is running still seems to me a good idea. however I can only be helpful enough when I start using it
Hello everyone. I am very glad that the first patch was noticed and I received quite useful feedback. A lot will have to be rewritten. And that's great.
hi, is there any news about v2 patch serie for upstream? I suppose anyway it would not be reviewed in time for the kernel 6.0 :( edit: merge window for 6.0 is already closed, I hope for the 6.1
Hi. That's what I really want to do. But I had to switch to more urgent tasks. I hope to be able to reduce my task list soon.
Hi!
This looks like a workable version of the patch for linux kernel 6.1-rc1 . It contains corrections in accordance with the comments that were made by the maintainers to the first version of the patch. I plan to test this for two or three days. I invite everyone to join the testing.
Feedback is welcome.
This patch has been sent.
good, I not checked it in detail but I noted a small mistake on the version of the patch serie posted, v1 instead of v2, but if you have sent it by now there is nothing else to do (or they would receive it double), I saw there is at least a list of changes from v1 in the cover so it should be "ok" anyway
edit: found the link of the serie posted, I put here if can be useful: https://lore.kernel.org/lkml/20221102155101.4550-1-sergei.shtepa@veeam.com/
Link to patchwork.
@SergeiShtepa with a search I found this: https://lwn.net/Articles/914852/ and seems that wrote about a lack of documentation to be accepted in addition to the documentation in code that probably need improvements I suppose they meaning to add another "generic" in Documentation/, I suppose in Documentation/block creating Documentation/block/blksnap.rst and adding in the list of Documentation/block/index.rst I don't saw reply to lack of documentation in the replies of the patch serie posted, you received other reply directly and not in mailing list?
Thanks. I don't know anything about it. I received one email about an extra space and 3 emails from the robot. I'll try to add a couple of comments to the article.
@SergeiShtepa I saw your replies on lwn, unfortunately even as you explained you want to avoid additional time being wasted reviewing documentation before it is accepted upstream I suppose without more documentation some people won't start reviewing it. For now on mailing list of patch posted I didn't see new replies :( I did a fast check with kernel-doc script of the kernel-doc comments actually present and spotted some warning: one I suppose solved with https://github.com/Fantu/linux-blksnap/commit/4f7c2365bcd5acc27a7bd546dc20b927fbf1b576 others spotted:
drivers/block/blksnap/tracker.h:47: warning: Function parameter or member 'flt' not described in 'tracker'
drivers/block/blksnap/tracker.h:47: warning: Function parameter or member 'submit_lock' not described in 'tracker'
drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'link' not described in 'storage_bdev'
drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'dev_id' not described in 'storage_bdev'
drivers/block/blksnap/diff_storage.c:21: warning: Function parameter or member 'bdev' not described in 'storage_bdev'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'link' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'bdev' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'sector' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'count' not described in 'storage_block'
drivers/block/blksnap/diff_storage.c:33: warning: Function parameter or member 'used' not described in 'storage_block'
To read the entire article https://lwn.net/Articles/914031/ and comments before December 1, you need to be a registered user. After December 1st, access will be unlimited. I tried to answer all the questions. I plan to work on the documentation and try to write an article for LWN. Maybe it will help.
spotted some warning:
Thanks, I'll have to check it out too for next patch.
Perhaps the next v2 patch with documentation and accompanied by an interesting article will attract more attention.
@SergeiShtepa thanks for replies and your work in practice v2 would be the last one posted and v3 the next even if it was written v1, I wouldn't want that perhaps having written v1 had led some people to suppose in a resend without changes, for example it seems strange to me that Christoph Hellwig who had participated a lot in the the first did not give any answer to the second (probably can be good add it in cc of the next) looking also on other sites the article with write about "the lack of doc." seems to have left its mark :( https://www.reddit.com/r/linux/comments/yvd14a/blockdevice_snapshots_with_blksnap_lwnnet_good/ probably a v3 with more documentation and even just with the bot's warning fixes and Randy's reporting will push sites to update about it and attract more people to review
@SergeiShtepa in the patch serie is missed the add of entry in MAINTAINERS file
@SergeiShtepa in the patch serie is missed the add of entry in MAINTAINERS file
Maybe... I'll add this in the next patch.
@SergeiShtepa thanks for the creation of documentation to include in the patch for upstream my english isn't great but i read them, blkfilter it seems to be ok, about blksnap there is a non-english sentence in "Snapshot overflow resistance" and the part of defects of the alternatives and merits of blksnap leaves me doubtful, I'm afraid that someone who will review or want to watch/try blksnap will get lost in counterproductive discussions.
about btrfs "Obviously cannot be applied to other file systems.", I suppose you mean the fact that the main advantages of transferring to another local or remote destination are lost from or to a different filesystem, and it is, but written like this is wrong, in practice it is possible to use it even if most of the advantages are lost. You can use it both for "temporary" snapshots during which you make a backup (for example with rsync) to another destination; and from a source without btrfs, you make backups with another (for example rsync) and on the destination you use the btrfs snapshots to manage the "history" with some advantages (I've been using it on some systems for years).
the rest of the disadvantages I saw in practice during a long testing some years ago thinking about using btrfs instead of ext4 on several virtualization systems to manage better and faster backups but the performance impact was huge (partly due to the use of hdd) and even if only the last corresponding snapshot was kept for send/receive with the destination there were disadvantages, reduced in part with a single destination and daily backup but considerable with multiple destinations (in my case 2) in "rotation" and having to keep a week or even more the snapshots. I've seen what the arrival of the btrfs fs to full space caused by snapshot, the damage it causes especially with the virtual machine (which do not have direct access to the fs and "amplify" the problem) and also the fact that it is not "automatically managed/avoided" and the workaround of scripting to see if it's about to fill up and deleting the snapshot(s) caused most of the "PROs" to decay when that happens, which is why we didn't use it in production for that use case
however without listing all the possibilities I think it is wrong to write things like "the blksnap module is the best way to create snapshots for backup tools", it is in good part of cases but not all. I'm not good at writing and I don't quite know how to describe such parts properly, objectively and correctly but I think that the documentation it should be done that way people should just understand the need for this method, its importance for a good chunk of cases and the need to get it good and integrated upstream
sorry if I didn't explain myself well and for my bad english
@Fantu - Thanks for the feedback!
I will try to improve those sections that you have noticed. After the weekend, I will review the documentation again. Maybe my colleagues from the technical documentation department will help me.
My English is also not as good as I would like. So I'm trying to do what I'm capable of. :) Creating documentation is hard work, and now it looks really important.
I can suggest you try to make the document better. Why not try? Even if I don't accept your merge request, it will allow me to take a fresh look at some sections.
@SergeiShtepa saw the changes, now is better, you missed to fix non-english sentence in "Snapshot overflow resistance"
Prepared a patch blksnap_lk6.1-rc8. There are problems with kernel-docs. I'll fix it. Everyone can try to build a patch, launch it, criticize it, suggest changes. I will test the patch on different architectures and in different test labs.
@SergeiShtepa thanks for your works before I posted a comment linking doc files in the article: https://lwn.net/Articles/917121/ about documentation of next patch serie I did fast test (make htmldocs), link is missed, tried fast to add it and generate again: https://github.com/Fantu/linux-blksnap/commit/0ee9d76019867bdde70e513e10e57398f75991f9
in maintainers file there is a small typo (reported in commit) I suppose will be good have also a git repo (like linux-blksnap to add in https://github.com/veeam) and add in it for make easier keep patch serie for user/dev use, test, rebase etc; for report specific issue, propose PR, after you (as blksnap maintainer) can review/add/improve them before post to kernel mailing list etc
Thanks.
make htmldocs
- i`ll check it, and fix.
will be good have also a git repo (like linux-blksnap
I have a local version. Maybe it makes sense to synchronize it with the server.
make htmldocs
- I checked it out. it looks good.
The module in the patch cannot build. I will check how it work on my lab and will update patch.
FWIK the path submitted for upstream should before be tested with build and use trying with both module and builtin, in addition for a better quality it would be better to test also on different architectures and different operating systems (for different build software versions etc) I don't know if it is possible to run entire kernel builds and automatic boot and use tests in basic and free CI testing of github, circleci and similar as they are very expensive, it will probably have to be configured on dedicated systems, I have never tried it but perhaps the kernelci could be worth trying, especially to set up a complete CI when it will be accepted upstream
another thing I think good to do will be prepare a better module build tests (for kernel >= 5.10 and <upstream version that will include blksnap) for commits and PR, the actual I did is very minimal and test only 1 version of kernel, probably I'll retry to do with github workflow and containers, but if there is someone with already experience on them I suppose can do better work and faster (I have almost no experience with it and it took me a long time just to read up a bit docs and prepare the actual test)
linux kenrel branch blksnap_lk6.1-rc8.
FWIK the path submitted for upstream should before be tested with build and use trying with both module and builtin, in addition for a better quality it would be better to test also on different architectures and different operating systems (for different build software versions etc)
I'm just doing testing. With tests and as part of the VAL product. That's right now I'm compile kernel for ppc64el. I also involve QA in this task. They have their own labs and autotests. But I prefer Debian/Ubuntu for kernel testing.
I don't know if it is possible to run entire kernel builds and automatic boot and use tests in basic and free CI testing of github, circleci and similar as they are very expensive, it will probably have to be configured on dedicated systems, I have never tried it but perhaps the kernelci could be worth trying, especially to set up a complete CI when it will be accepted upstream
I have no experience in this field.
another thing I think good to do will be prepare a better module build tests (for kernel >= 5.10 and <upstream version that will include blksnap)
it is possible to use the SUSE service to check compilability.
Branch blksnap_lk6.1-rc8_v2 with patch for linux kernel. Prepared email with patch. Regression testing is scheduled at night. If everything goes well, then tomorrow it will be possible to send a patch.
good, I did a fast check of latest patch and with "make htmldocs" remained these 2 warning:
./include/uapi/linux/blksnap.h:243: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
/home/fabio/repository/linux-blksnap/Documentation/block/blksnap.rst:139: WARNING: duplicate label block/blksnap:change tracker, other instance in /home/fabio/repository/linux-blksnap/Documentation/block/blksnap.rst
I don't know what of the 2 "Change tracker" title change and with what, also a minimal change to don't have same label (for link) I suppose is enough
relooking about blkfilter doc probably will be good do some addition, for example mention that is done as "single filter" (as requested by a block maintaner) and used by blksnap (and link to it doc)
and probably include also kernel-doc of the functions?
did a fast try (https://github.com/Fantu/linux-blksnap/commit/798b0f2c42dae8b5f822294245e24e89bcce1e58) but text need to be improved and link have a warning that need to be fixed:
/home/fabio/repository/linux-blksnap/Documentation/block/blkfilter.rst:26: WARNING: undefined label: ../blksnap.rst (if the link has no caption the label must precede a section header)
Spotted also 2 descriptions to be updated: https://github.com/SergeiShtepa/linux/commit/b4363f5891bf674be4a19ea1099375f9240e7cc1#r92333996 https://github.com/SergeiShtepa/linux/commit/b4363f5891bf674be4a19ea1099375f9240e7cc1#r92334083
Edit: I also noticed the importance of good documentation with kernel-doc (or more exactly one of the reason), the page with the description and from there linked to the kernel-doc comments in the code and from a simple function link in a very simple and fast way in a few clicks can follow the links in cascade to various functions linked in different files, it would take much longer without this
Thanks a lot for the feedback.
I realized that the documentation needs improvement. I'm working on it.
@Fantu . I have modified the documentation. There are many changes. commit in my linux. But now the html version of the document looks harmonious. I haven't moved the changes to the master for the module yet.
@SergeiShtepa good, thanks I tested (make htmldocs) and now there is no warning related to blksnap and blkfilter, I spotted a typo (comment in the commit), checked blkfilter and blksnap html output generated and seems good, for me further expansions could also be made later but it will be necessary to see the opinion of the more expert, tried to ask to the article author: https://lwn.net/Articles/917482/ I'm curious to know his opinion, and/or others About build kernel and test it (with the new version of the patch) I not had time for now, I hope to have time tomorrow.
about the change to Documentation/block/capability.rst seems the output (html) generated from your patch is already near the same of the one actual (https://docs.kernel.org/block/capability.html) I suppose there is something related to kernel-doc that I don't know that already make select "DOC: genhd capability flags" part in include/linux/blkdev.h. Is there a fix/improvement anyway I don't see from a fast look? if yes can be posted directly to the mailing list in a separate patch (separate from blksnap serie I mean)
good, one people already replied about documentation part and there is still warnings/errors spotted by kernel test robot I think we should do more for make reachable/visible the project, small and fast things as wrote here: https://github.com/veeam/blksnap/issues/21#issue-1465200362 I also did some small improvements to readme: https://github.com/veeam/blksnap/pull/28 But many other improvements and addition are needed to make both developers and users easy and fast start using/testing/contributing
@SergeiShtepa I see a very old version of your patch posted: https://lkml.org/lkml/2020/10/21/122 and based on these: https://lkml.org/lkml/2020/10/21/128 https://lkml.org/lkml/2020/10/21/849 in https://lore.kernel.org/lkml/20221209142331.26395-3-sergei.shtepa@veeam.com/ is missed EXPORT_SYMBOL() -> EXPORT_SYMBOL_GPL()
is missed EXPORT_SYMBOL() -> EXPORT_SYMBOL_GPL()
I think it's not a matter of principle. Any non-GPL module can copy the body of the attachment function and also use the filter.
@SergeiShtepa thanks for reply, sorry, I did not know about that EDIT: I should stop posting when I'm too tired... I know instead that unfortunately there are similar things but not doing it by saying "they can get around it anyway" I think is not a good thing
I saw also this about patch versioning: https://lkml.org/lkml/2020/10/24/108 version should always be incremented (except without change that will need "RESEND") and include the changelog of all version posted seeing the old patches I probably found why unfortunately many people no longer review the blksnap patches :(
however in the last few patches you made many progress to follow the guidelines, make the fixes and improvements requested by the maintainers, keep making better patches and I'm sure the revisions will increase and finally you will be able to integrate blksnap upstream :)
@SergeiShtepa I re-read the cover of v2 and changelog part don't seems look very good, to mitigate the "error of v1" and clarify the changes perhaps for the next version I think can be changed with something like:
The initial "without version" was posted at 13 June 2022. ...
Changes in v1:
- Forgotten "static" declarations have been added. ...
The v1 version was posted at 2 November 2022. ...
Changes in v2:
The v2 version was posted at 9 Dicember 2022. ...
Changes in v3:
I also replaced "suggested" with "posted" but if you think suggested is better keep it
or show them backwards as many do to have the latest changes at the top:
Changes in v3:
The v2 version was posted at 9 Dicember 2022. ...
Changes in v2:
The v1 version was posted at 2 November 2022. ...
Changes in v1:
The initial "without version" was posted at 13 June 2022. ...
this look to me also better
I saw the replies of Christoph Hellwig, he gave another time a very good contribute. I suppose that for next version should be more that only mention in the cover but something at least in the patches where he gave a significant contribute: https://docs.kernel.org/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
I saw the replies of Christoph Hellwig, he gave another time a very good contribute.
Yep. It makes sense to continue working further. :)
I suppose that for next version should be more that only mention in the cover but something at least in the patches where he gave a significant contribute: https://docs.kernel.org/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
Yes. I will take this into account when preparing v3. In any case, I will publish the patch in blksnap/patches/ in advance for verification.
Hi. There were a lot of changes in the master branch. Bug fixes from the VAL-6.0 branch and changes related with preparing v3 patch. So, something might not work :) The module interface has changed, so the old binary code may not work. Interface changes are not finished yet, the most interesting is yet to come.
@SergeiShtepa thanks for your work, I suggest in the case like this to keep incomplete/not working work in a separate branch (but pushed) and not merging in master already or users/developers that will try blksnap first time and see the "bad state" can't know is a temp. work in progress can be also useful open a PR (and mark draft if incomplete) for users/contributors want see/review/test a work in progress
edit: on https://github.com/veeam/blksnap/commit/9467808a044a22850c61b3b30d899f852bcc4733 I see "version.h available only for standalone version" but no change related, I suppose you want remove version from patch for upstream, or I wrong? in this case are you really sure version is not needed to keep track of changes? especially compared to the module out upstream as long as it will exists
Thanks for the comment. I use a workflow out of habit, where the master branch is an unstable development, and stable versions in separate branches, such as VAL-6.0. Perhaps this does not correspond to the generally accepted workflow. I'll think about it twice next time. On the one hand, it is possible now to make a branch from the commit and rename it to master, and rename the current one to devel. On the other hand, a stable master branch does not meet the goals of the repository. The main goal is to become part of the Linux kernel. Therefore, it makes no sense to keep a stable master branch. So, I'm not sure which choice would be better.
I'm think to fix the first version 1.0.0. when module will be in upstream. The increment will be do only for after that. Perhaps by that time a standalone version will no longer be needed. Although the repository will of course remain to support the user space code. I think we will need a version at that stage of the project's life.
@SergeiShtepa Hi, thanks for your works on blksnap kernel module, I think will be very useful for improve backup on linux. I'm curious about the status of its possible upstream integration, from https://github.com/veeam/blksnap/commit/85bad8f010d08fb0d08fc2c33e5ca3630dfe2a82 seems you prepared a patches serie for upstream (for send a mail with it) but I not found it posted searching on lkml.org and google. I saw also other series of months ago on other branches but I not found them on upstream mailing list and found only older your works of 1 year ago or more. Thanks for any reply and sorry for my bad english.