tapis-project / tapipy

Python lib for interacting with an instance of the Tapis API Framework
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Rewrite Tapipy spec storage/caching mechanisms for instant import #54

Closed NotChristianGarcia closed 1 year ago

NotChristianGarcia commented 1 year ago

Some applications require very quick import during the from tapipy.tapis import Tapis step. Currently this process seems like it's perhaps CPU limited. Takes 3 seconds on some quick computers (mine) and slower in environments with fewer resources (Kubernetes containers for example).

Currently working on three separate solutions. The main problem to fix is import time, most time is eaten by Tapipy's auto initialization step which grabs cached, pickled, python dictionaries of service specifications. Tapipy then runs openapi_core's create_spec which is famously slow. For example on the Workflows spec (perhaps the more logically complex spec), import can take up to 2.2 seconds. Nathan Freeman has taken time to attempt to make this step parallel, however no speed increase is seen in the current implementation. Possible conclusion is that create_spec is cpu bound or causing some lock somewhere (It doesn't make much sense).

Solutions!

  1. Instead of running create_spec, we can try and pickle the complete spec object and import that. Pickle doesn't support this operation, however an add-on to pickle, dill does seem to at least dump the spec to a file.
    • I'm having trouble loading the dilled spec however, possibly could be to decorators or references in the json that the object is attempting to reconcile on re-initialization.
    • This solution is easiest, I'll continue to work on it, but it seems kind of likely that it'll break.
  2. Instead of dilling the spec, we could also possible serialize the spec object and retain all information that we require.
    • I've started testing and this seems very possible. Still working on a proper serializer, I'm currently cutting off information pertaining to content-type for responses for some reason.
    • This would require quite a change to the code as well, but we can for sure pickle, so that's great.
  3. Third is more bandaid, but we can possibly combine specs/dicts to load from only one file rather than one per service. Initially we had far fewer specs, it's starting to get crowded however and this process should be streamlined (for the tapipy resource_set), unsure of it's speed improvements overall however.