Open knorrie opened 7 years ago
This is awesome, Hans! I've used this to have a tool that can allow me to gradually even up the use of storage on my array without having to do a full balance.
I can think of a few real-world tools that I might build using this:
The tutorial-style documentation idea sounds great too; especially if we can work through to real-world use cases where BTRFS makes a big difference (which don't seem to be well highlighted at present) like de-dupe or my personal fave:- date-sharded MySQL DBs with an daily defrag process that compresses the previous day's tables for a improved IO performance and a much better compression ratio than MySQL's inbuilt compression functionality.
Hi, thanks for the feedback. Yes, that's exactly what this library is meant for. Having fun creating custom tools with easy access to all information and functionality of an online filesystem.
The dedupe ioctl was just added after someone provided an initial patch (still in the develop branch, wip) and defrag isn't there yet. The main reason is simply that I'm not using them myself a lot and nobody asked for it before.
Starting to write some documentation is pretty high on my todo list currently, I'm looking forward to it.
Work is in progress! https://github.com/knorrie/python-btrfs/blob/tutorial/tutorial/README.md
First two parts almost completed. Feedback appreciated.
Work is in progress! https://github.com/knorrie/python-btrfs/blob/doc/tutorial/README.md
That probably should point to the branch in question: https://github.com/knorrie/python-btrfs/blob/tutorial/tutorial/README.md
Very nice so far!
Yes! I updated the link.
Thanks for the kind words, that's encouraging. :)
Ok, it seems most sane to choose a very conventional way of documenting python code, which is using docstrings and sphinx.
When moving tutorial style documentation in there, I can cross-link all things, from tutorial to reference documentation and back.
Ok, so, a few weeks ago I sent a message to the btrfs list that I was planning to release python-btrfs v10 in about a week. Ahem, afterwards, instead, I rewrote a large part of the new fs_usage module again, and I realized that it would be pretty much unusable for anyone (except running the provided example script) without proper reference documentation.
So, I started writing that. Current work in progress is here: https://python-btrfs.readthedocs.io/en/latest/py-modindex.html
I started with fs_usage, and now also have done most of the ioctl module already.
Tell me what you think of it! This is the reference docs, so they just tell you what you can do. The tutorial style documentation is of course still the necessary counterpart that also has to exist.
Keep up the good work! To be honest, tutorial is more useful because it provides also info on how btrfs works.
If you have time, please check my project based on your library... I believe it might be useful as well. I would like your guidance to expand it. https://github.com/dim-geo/btrfs-snapshot-diff
They're both important. Without the tutorial, it's really hard to figure out which of the functions do what, and without the reference docs, it's hard to start doing different things because examples in a tutorial will give you ideas of what else you want to do.
Thanks for sharing your project! This is a nice example of something that can be made yes. One first thing that I can see is that it seems you're comparing the tuple objectid, logical_offset, disk_num_bytes. However, a real data extent can have multiple references for any range of 4k blocks inside that extent. So, if file X in snapshot 1 is using 8k - 20k from it, and file X in snapshot 2 did a 4k write that ended up in a new extent and only still uses 12k - 20k from the old one, you miss that it's still shared data?
And, I can recommend a code style check tool like flake8.
Are you on IRC? #btrfs on freenode is always a nice place to hang out and discuss things.
Ok, almost done with the reference documentation. While writing this, I found so many little things that could be improved and I was tempted into doing all of them. I've gone over 100% of the code that was written since this lib started and I'm happy with what it's ending up like.
https://syrinx.knorrie.org/~knorrie/btrfs/python-btrfs-doc/btrfs.html
This is from the current develop branch. Please read and give feedback.
Just volumes.py and free_space_tree.py are left to do, and then testing everything again, and then it's going to be the v10 release.
Hi,
I'm building the man documentation, but only get a python-btrfs.1 file with the index, not the full documentation as in the html version.
Also, I think it should be in the section 3 of man, as it's documentation for a module.
Thanks!
To be honest I have no idea. I wrote all the autodoc, and see some html as result that makes sense, but I haven't looked any further yet. It's really a 'minimal viable product' now.
I do think it makes sense to ship the html documentation. In Debian I'll have to create an additional "binary" package, python-btrfs-doc it seems.
So with having v12 out finally, I can clear my mind and start looking at the old WIP pile of tutorial documentation again. Currently it's markdown stuff, but I think it makes sense to move it into the sphynx docs. In there, only the index section makes (a lot of) sense now, besides that, there's nothing yet.
It would be nice if there was some documentation included this project instead of the current terse README text.
So... my question is... (thinking out loud)
For anyone interested in technical things about btrfs, the commit messages for the full history of the project contain a wealth of information already, and the examples cover all implemented functionality.
But, maybe it would be nice to have some tutorial-style documentation pages to show you around the world of btrfs with code examples and explanation about why metadata is organized the way it is. (Like, what is a chunk? What's the difference between a chunk and a block group? Why are all those names so confusing? How can I see what chunks I have? etc...)
See for example my linux networking tutorials for an idea about how the writing style would look like. One of my all time favourite tutorials is the Linux Advanced Routing & Traffic Control documentation, which helped me learning the basics of networking a long time ago. The page about Exploring your current configuration has been the first inspiration point for me to learn how to write fun documentation, showing a user around to discover what's happening.
So, the existence of this issue is an opportunity for anyone out there to comment on these ideas and provide feedback.