Closed gauravdiwan89 closed 3 years ago
I guess you can call that a luxury optimization, if you have machines with tons of memory you will see some benefit. One compromise would be to only add the _cs219.ff{data,index} to shm and symlink the other files. That would ensure that at least the small context states prefilter database never leaves memory. You could also you https://github.com/hoytech/vmtouch to achieve something similar.
Thanks a lot for the prompt answer! That is precisely what I was asking, if I could load only a subset of files into shm. And thanks for the link to vmtouch, will try it out in case the first solution isn't optimal.
I am planning to run hhblits for thousands of sequences using the latest Uniclust30 (UniRef30_2020_02) database as my target database. I intend to set up the run on our computing cluster and was going through the suggestions on the wiki about efficiently running hhblits on a computer cluster.
I understand that it is recommended to load the database files into the “/dev/shm” folder of the computing node on which the job is running. Although I have not tried this, I found out that only a couple of our nodes have virtual RAM disks (i.e. the “/dev/shm” folder) have a size that can allow for all the files of the Uniclust30 database (total of ~200gb) to be stored. Can you please clarify if I will be able to run the jobs on only the nodes which have 200gb of available virtual RAM disk space? Or am I missing something?
Many thanks!