Closed joren485 closed 1 week ago
Hey @joren485, thanks for this contribution. My 2 cents: Adding a method to read the inode contents into memory could be useful for some applications, but calling it within write_inode_content_to_file()
removes the ability to produce a sparse file.
Keep in mind that not all inode pages may be cached, so the resulting file on disk might contain many zeroed pages. For instance, if an inode's size is 1TB but only the one page is cached, the original method would report an apparent size of 1TB while the actual disk space used is only 4096 bytes. With your changes, both apparent and actual sizes would align at 1TB. This is important for large-scale file recovery.
Also, returning the entire buffer/file like that is inefficient for large files. Locally, it can lead to heavy disk I/O due to swapping. For remote applications, it results in unnecessary data being sent over the network.
Lastly, the read_inode_content() implementation contains a bug, as it assumes all pages are present. This can result in the final buffer being a different size than the original inode, leading to inaccurate conclusions.
It would need to be 1.1.0
Done
removes the ability to produce a sparse file.
I changed read_inode_content
to take a typing.IO
object (e.g. a file or a BytesIO) as an argument and added a buffer.truncate
call. The functionality should now be the same as the original.
Lastly, the read_inode_content() implementation contains a bug, as it assumes all pages are present. This can result in the final buffer being a different size than the original inode, leading to inaccurate conclusions.
Good catch! As I did not change the functionality of reading inode contents and do not have access to a memory image to reproduce the bug, I think it is best to create a separate issue for this.
EDIT: Sorry I was on the phone and I missed some of the details.
Good catch! As I did not change the functionality of reading inode contents and do not have access to a memory image to reproduce the bug, I think it is best to create a separate issue for this.
By calling truncate(), you've now fixed that bug in your implementation. The original write_inode_content_to_file() method already handles those cases without issue.
There's still something a bit off here, a method named read_inode_content() that writes to a file doesn't quite feel right. Also the docstring is inaccurate "Read the contents of an inode into memory."
@joren485 Is there a particular use case you had in mind for read_inode_content()? It feels a bit like we're stretching to find one, but I'll defer to @ikelos if he thinks it has potential.
I don't have a horse in this race. The write function was put in specifically to write sparsely, so changing/removing that functionality seems bad. Now that the read function is parameterized with an IO object, I believe they should operate with the same results (it would be good if someone has an example to make sure the file still is actually sparse when it's written). It sounds as though @joren485 had another plugin that wanted to make use of it? If so, and if it doesn't remove functionality, I'm ok with it going in, but it would be nice to know there's a specific reason for it rather than just making changes for the sake of it...
It is on the parity list to break this out into a real API so other plugins can get file contents, so I would like to see it merged in some form.
Some common examples:
Many more like this.. All of them will need the ability to dynamically gather file contents and parse it.
Last week, I was part of a volatility training where a read_inode_content
-like function would have been useful. So, there is no specific plugin that I would like to add when this is merged, but as Andrew points out, there are many plugins that would benefit from such functionality.
If you feel like this change needs a better justification (i.e. is added when there is an actual plugin that uses it), it is also fine by me to just close this MR.
Also the docstring is inaccurate "Read the contents of an inode into memory." Fixed
@ikelos @gcmoreira it seems like the main concern is reading too much into memory? If so, the API could take an optional parameter of maximum size and cut off after. The files that would be auto parsed by plugins are pretty small - a 10MB cap would be more than enough of a limit.
I think it's more that it's being read with padding, rather than being read sparsely. I don't know whether BytesIO supports a sparse mode (so that the mostly empty bytes aren't stored in memory), but otherwise constructing a data structure that yields a generator of the offsets and the needed bytes might better? That would be consumable and effectively stream the data rather than send it in one big chunk? I'm not a fan of arbitrary cut-off limits because they're not very flexible once they're hit and they create a bunch of boundary conditions that people then run into and complain about.
The idea is to have a wrapper function so every plugin doesn't have to do the gap reconstruction on its own as that removes repeated code and provides a consistent interface. With the max size being controlled by an optional parameter then plugin writers don't have anything complain about. They can jump bump the size if they want. Does this work?
Oh, I see. Well as I mentioned I don't really have a strong opinion on it so if you think it's a reasonable solution and Gus doesn't have an objection then I'm fine adding it. I guess it's up to @gcmoreira and you to sort out the best route forward?
@atcuno for text files it will be ok. We could add that extra parameter if you think it's needed. I see that more useful for automatic processing of files, maybe?
Ideally users should call linux.pagecache.Files
first to assess the number of inode pages and determine how many are cached. Then before dumping that inode they should also see if the pages are safe to dump using the linux.pagecache.InodePages (of course, without --dump) and check the (DumpSafe) field.
This way, they can evaluate if the content will add any value rather than dumping indiscriminately an useless stream of bytes. After doing that the user should be already aware of the file size.
Again, it doesn't hurt, so we can include that as well.
Given the significant discussion, I think it is best to close this MR for now. I think someone, with a better understanding of Volatility3 than I, should implement this feature (if you decide you want it). Please feel free to use this MR in the future (or to ignore it completely), when you have a better idea of what the linux.pagecache
API should look like.
@joren485 I think you were doing fine. Sometimes PRs are where we flesh out how we want something to work, and that can take some time (and some initial disagreement). Public back and forth can be difficult to deal with, but essentially I'd just leave the interested parties to figure out their differences, until a general agreement is made, and then implement any changes that fall out of that.
This looks like a valuable plugin, and we'd definitely like you to be the contributing author of the patches for the work but now that the branch has been deleted, it's going to be very difficult for others to work on the code. Please consider resubmitting it as a new PR (referencing this one), and then let it sit there whilst the discussion continues and everyone figures out their differences and we can get it in a shape everyone's happy with and get it merged...
This PR adds a
read_inode_content
method to read the conent of an inode. This makes it possible to use the content of an inode inside other plugins.