Do I need to re-load files into memory via vmtouch if file size changes

huangye177 commented 11 years ago

Hi hoytech,

Thanks a lot for this really cool tool!

I am working on loading database files into memory via vmtouch. I am wandering that, after I load some database files into memory via vmtouch (e.g., vmtouch -ld /database_path/database_table_files), such as the MySQL index files (.MYI), since these database files will increase through time (although they are the same files), do I need to clean the files out from memory (via vmtouch -ev ) and re-load them into memory through time?

Thanks a lot!

Ye

hoytech commented 11 years ago

Hi Ye,

I'm not too familiar with how MySQL manages its index files, however I can give some rules of thumb for how to investigate this.

If the inode of the file changes (according to ls -i for example), then you will need to kill the daemon process and re-run it so it maps the new file contents and the current pages are locked into memory.

Additionally, if new pages are appended onto the end of a file, killing the daemon process and re-running it will ensure that all the pages in a file are locked.

Both of the above issues are because vmtouch doesn't really work on files exactly. Instead, it works on the pages referenced by a file at a given moment in time. If the file changes to different pages or additional pages are appended, vmtouch will not pick up on these changes.

Note that your operating system is probably fairly smart at paging out pages no longer needed so you may not need to evict (vmtouch -e) the pages. Your operating system will do that eventually.

Hope this helps,

Doug

huangye177 commented 11 years ago

Hi Doug,

Thanks a lot for your info! I checked the inode of the file and it doesn't change through time; only the Blocks appended to this inode is changing when new content is added.

I suppose then I can conclude that, in this regards, once the file is mapped to memory via vmtouch, the existing content (original content blocks) will be found from memory. For the newly added content, because they are not in the memory but still exists in the same inode, failure content-mapping in memory will result a direct content read from the file itself from hard disk, am i right?

Cheers,

Ye

hoytech commented 11 years ago

Normally your operating system cache is pretty smart and if the pages have been accessed recently they will be read from the filesystem cache memory.

The vmtouch -l option can "lock" pages into memory so they will always be read from memory even when they haven't been accessed recently. If you are using locking and new pages are added, yes you will need to re-lock the new pages in order to guarantee that they are in memory at all times.

Note that unless you have special requirements, you can usually depend on your operating system cache to figure out which pages should be cached in memory and you don't need vmtouch -d.

huangye177 commented 11 years ago

I checked the performance of "re-lock" right after new pages were appended, it was pretty fast! I suppose that is because those "already-in-memory" pages have not been swapped out by the OS due to short interval duration, thus those "already-in-memory" pages do not need to be processed again. -- that is cool!

Thanks a lot for your help information!

sandstrom commented 10 years ago

Sorry for hijacking, I found this issue after Googling around for a while.

It may be outside the scope of vmtouch, but it would be neat to somehow watch a directory, and lock both existing and new files into memory. I.e. if a file is added later, it would be locked into memory too.

hoytech commented 10 years ago

Thanks for your feature suggestion. That would be a pretty neat feature and I'll consider it.

In the meantime, you could periodically kill the vmtouch process and re-start it. The new process will re-crawl and lock any new files (as well as releasing the memory backing any deleted files). It's not quite as elegant but should work. The only issue is that the files will be temporarily unlocked from memory which (depending on your real-time constraints) might be unacceptable.

sandstrom commented 10 years ago

One use-case I had in mind is the tmp folder where web-servers temporarily store uploaded files. Marking that folder would be neat, but files often live there for only a few hundred milliseconds.

Also, ideally it would be for both reads and writes — but perhaps a regular ram-disk is better then.

Thanks regardless, vmtouch is great!

maci0 commented 9 years ago

maybe vmtouch can use inotify and re-lock whenever the file content changes. not sure about the performance implications here tho.

hoytech commented 9 years ago

@maci0 - yes that's a really good idea. I think it would also need to monitor the directory containing the file in case the file is unlinked and replaced with a new file.

I think the performance should be reasonable but you should be able to opt out (in?) just in case.

This is one of the features I'm considering for a "libvmtouch" although I haven't had a chance to work on that for a while... One of these days. :)

maci0 commented 9 years ago

@hoytech inotify has a delete_self event.just need to watch for that and then reopen the file in case a new one was created. moved_to and moved_from only works within the same directory it seems there are the inotify-tools to test it out.

hoytech commented 9 years ago

I haven't looked into the inotify interface closely before, thanks for the info. I'd guess another thing needed would be to recursively monitor a directory for new/deleted files so we can immediately lock/unlock as needed.

At spotify they re-spawn the vmtouch process periodically. See slide 32 here:

http://www.slideshare.net/JimmyMrdell/playlists-at-spotify-cassandra-summit-london-2013

But of course your suggestion of an inotify solution would be ideal.

maci0 commented 9 years ago

@hoytech also i just found the following in the mlock manpage

MCL_FUTURE

Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions.

wouldnt this work?

maci0 commented 9 years ago

ok.. one would still need to monitor file changes and then mmap again it seems

hoytech commented 9 years ago

MCL_FUTURE is a very interesting flag to mlockall() although as you noted it might not help vmtouch that much. You're right we'd still have to monitor directories for new/deleted files and monitor for files growing/shrinking and create/adjust our mmaps as appropriate. However, the mlockall(MCL_FUTURE) would let us avoid calling mlock() after calling mmap().

There are some downsides with mlockall() though. One downside is that it locks pages that you probably aren't interested in locking (such as libc and other libraries). Another is that if you hit your rlimit or system-limit on wired memory, I believe you'll get weird EAGAIN failures from mmap() or brk() or whatever which makes it complicated to report this issue to the user. When mlock() fails it's easy to report this problem with the correct error message.

For a real-time app I think MCL_FUTURE would be valuable but I think it's best if vmtouch uses mlock() as the default.

hoytech commented 8 years ago

Closing this ticket. In summary, the answer is yes. Currently you need to kill and re-run vmtouch if new files are added or a file's size changes.

Here is a quick and dirty solution from spotify (slide 32):

http://www.slideshare.net/JimmyMrdell/playlists-at-spotify-cassandra-summit-london-2013#slide32

I've added a ticket to consider inotify suggestions made in this thread: #39

Thanks!

ghost commented 6 years ago

Spotify has some odd formatting on their code (well at least to me). Here's a re-format into something hopefully easier to read and modify for home use:

#!/bin/bash
while true; do
  vmtouch -m 10000000000 -l *head*
  sleep 10m
  kill %vmtouch
done

This way it also doesn't just disappear if that slide show goes poof.

hoytech commented 6 years ago

@Michael-IDA - Thank you!

hoytech / vmtouch

Do I need to re-load files into memory via vmtouch if file size changes #12