Open ntBre opened 3 weeks ago
After thinking more about this, possibly a better solution would be for us (or actually a user) to subclass PortalClient
and override the various get_{singlepoints,optimizations,torsiondrives}
methods with cached versions. That would avoid having to add the record_cache
field to the standard PortalClient
, which felt weird to me, and revert all of these changes because they would be handled by providing this new CachedPortalClient
. I'll try this approach tomorrow.
Here's an example of the PortalClient
subclass. Unfortunately, I also found out that the no_internet
trick is not infallible. I think the fact that the PortalClient
opens a requests.Session
and holds onto it means that it can access the internet without going back through socket.socket
. However, I monitored internet traffic with Wireshark and made sure that this code does not access the internet on the second to_records
call, which is also verified by setting _req_session
to None
.
import os
import shutil
from qcportal import PortalClient
from qcportal.cache import RecordCache, get_records_with_cache
from qcportal.optimization import OptimizationRecord
from openff.qcsubmit._tests.utils.test_manager import no_internet
from openff.qcsubmit.results import OptimizationResultCollection
from openff.qcsubmit.utils.utils import portal_client_manager
class CachedPortalClient(PortalClient):
def __init__(self, addr, cache_dir, **client_kwargs):
super().__init__(addr, cache_dir=cache_dir, **client_kwargs)
self.record_cache = RecordCache(
os.path.join(self.cache.cache_dir, "cache.sqlite"), read_only=False
)
def get_optimizations(self, record_ids, missing_ok=False, include=None):
return get_records_with_cache(
self,
self.record_cache,
OptimizationRecord,
record_ids,
include=["initial_molecule", "final_molecule"],
)
if os.path.exists("api.qcarchive.molssi.org_443"):
shutil.rmtree("api.qcarchive.molssi.org_443")
opt = OptimizationResultCollection.parse_file("tiny-opt-dataset.json")
client = CachedPortalClient("https://api.qcarchive.molssi.org:443", cache_dir=".")
with portal_client_manager(lambda _: client):
client._req_session = None
with no_internet():
print(len(opt.to_records()))
This code runs on the branch for #284 without any changes to qcsubmit. The implementation of CachedPortalClient
basically needs to know which methods we call on PortalClient
internally, though, so I think it might make sense for us to provide this class even though a user could define it.
I've updated the implementation to use the CachedPortalClient
from the code block above. We discussed making the class private, but I left it public for now because I think it might be useful for anyone wanting to use additional PortalClient
kwargs with the portal_client_manager
. I'm happy to make it private if you prefer, though. That's a pretty advanced use case and maybe not one we want to support in the public API.
I have not added any tests specifically for the caching yet, but I think it's a great first sign that the tests pass after replacing the default PortalClient
with this cached version. I also tried intentionally introducing errors in each of the new methods to make sure they were actually being called, so they are all covered by existing tests at least.
One issue I've run into with testing this is that each test uses the same default cache_dir
. I tried adding a pytest fixture to delete the cache dir before each test ran, but this caused disastrous issues with pytest-xdist running multiple tests in parallel (one would be trying to write to the cache while the next test tried deleting the dir). I think this means that fully testing the caches will be more involved than I hoped. It also means that I currently have to delete the cache dir locally between each test run, or the tests just use the cache. That won't be an issue in CI at least. My best idea for fixing this currently is to modify the tests using PortalClient
s more invasively to wrap the code in portal_client_managers
with temporary cache dirs so that each test can run fully independently and clean itself up afterward.
The tests should now be passing with temporary cache directories everywhere CachedPortalClient
is used. In my local tests I was seeing ignored instances of the error from MolSSI/QCFractal#844, but it sounds like that has been fixed in the next version (0.56) of qcportal.
Now, I think I just need to update these tests to check that the caching actually worked.
The tests have been updated, and I copied over (and modified) documentation from qcportal. It should be ready for review!
Description
This PR uses the get_records_with_cache function discussed in our last QCArchive meeting to cache
to_records
calls automatically. With these changes alone, users cannot actually access this behavior, but when combined with #284, it enables code like this:For this PR, I applied the changes separately from #284 to the main branch, but I have used this code to test the combined changes locally, and it works! I have not run into MolSSI/QCFractal#844 here yet, so hopefully that is less common in real use cases.
Todos
to_records
callsStatus