mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.89k stars 61 forks source link

Perf #198

Closed kurtbrose closed 3 years ago

codecov[bot] commented 4 years ago

Codecov Report

Merging #198 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #198   +/-   ##
=======================================
  Coverage   98.93%   98.94%           
=======================================
  Files          26       26           
  Lines        4052     4071   +19     
  Branches      575      579    +4     
=======================================
+ Hits         4009     4028   +19     
  Misses         20       20           
  Partials       23       23           
Impacted Files Coverage Δ
glom/matching.py 100.00% <ø> (ø)
glom/core.py 98.87% <100.00%> (+0.01%) :arrow_up:
glom/test/test_path_and_t.py 99.28% <100.00%> (+0.02%) :arrow_up:
glom/test/test_target_types.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d99ccaa...404abae. Read the comment docs.

kurtbrose commented 4 years ago

so far:

those two changes reduced ~1 sec to ~0.75 sec (~100 to ~75 us per object)

other low hanging fruit:

kurtbrose commented 4 years ago
                                    /code/glom/core.py: % of time =  53.74% out of  11.69s.                                    
       ╷       ╷       ╷    ╷                                                                                                  
  Line │Time % │Time % │Sys │                                                                                                  
       │Python │native │%   │/code/glom/core.py                                                                                
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │    │                                                                                                  
   630 │  1%   │       │    │    @classmethod                                                                                  
   ... │       │       │    │                                                                                                  
   638 │  1%   │       │    │        if text not in cls._CACHE:                                                                
   ... │       │       │    │                                                                                                  
  1471 │  1%   │       │    │def _t_eval(target, _t, scope):                                                                   
   ... │       │       │    │                                                                                                  
  1474 │  1%   │       │    │    fetch_till = len(t_path)                                                                      
   ... │       │       │    │                                                                                                  
  1498 │  1%   │       │    │    while i < fetch_till:                                                                         
  1499 │  1%   │       │    │        op, arg = t_path[i], t_path[i + 1]                                                        
  1500 │  1%   │       │    │        if type(arg) in (Spec, TType, Val):                                                       
   ... │       │       │    │                                                                                                  
  1507 │  1%   │       │    │        elif op == '[':                                                                           
   ... │       │       │    │                                                                                                  
  1514 │  2%   │       │    │            get = scope[TargetRegistry].get_handler('get', cur, path=t_path[2:i+2:2])             
   ... │       │       │    │                                                                                                  
  1516 │  1%   │       │    │                cur = get(cur, arg)                                                               
   ... │       │       │    │                                                                                                  
  1531 │  1%   │       │    │        i += 2                                                                                    
   ... │       │       │    │                                                                                                  
  1740 │  1%   │       │    │    for field, subspec in spec.items():                                                           
   ... │       │       │    │                                                                                                  
  1744 │  1%   │       │    │        if type(field) in (Spec, TType):                                                          
   ... │       │       │    │                                                                                                  
  1761 │  1%   │       │    │        scope[Path] = base_path + [i]                                                             
   ... │       │       │    │                                                                                                  
  1773 │  1%   │       │    │    for subspec in spec:                                                                          
   ... │       │       │    │                                                                                                  
  1781 │  1%   │       │    │        if not isinstance(subspec, list):                                                         
  1782 │  2%   │       │    │            scope[Path] += [getattr(subspec, '__name__', subspec)]                                
   ... │       │       │    │                                                                                                  
  1823 │  1%   │       │    │    def get_handler(self, op, obj, path=None, raise_exc=True):                                    
   ... │       │       │    │                                                                                                  
  1835 │  2%   │       │    │                ret = type_map[obj_type]                                                          
   ... │       │       │    │                                                                                                  
  1837 │  1%   │       │    │                type_tree = self._op_type_tree.get(op, {})                                        
  1838 │  1%   │       │    │                closest = self._get_closest_type(obj, type_tree=type_tree)                        
   ... │       │       │    │                                                                                                  
  1842 │  1%   │       │    │                    ret = type_map[closest]                                                       
   ... │       │       │    │                                                                                                  
  1855 │  1%   │       │    │    def _get_closest_type(self, obj, type_tree):                                                  
   ... │       │       │    │                                                                                                  
  1857 │  3%   │       │    │        for cur_type, sub_tree in type_tree.items():                                              
  1858 │  1%   │       │    │            if isinstance(obj, cur_type):                                                         
  1859 │  1%   │       │    │                sub_type = self._get_closest_type(obj, type_tree=sub_tree)                        
   ... │       │       │    │                                                                                                  
  2109 │  1%   │       │    │    nxt_in_chain.maps[0][NO_PYFRAME] = True                                                       
   ... │       │       │    │                                                                                                  
  2120 │  1%   │       │    │    glomit = getattr(obj, 'glomit', None)                                                         
  2121 │  1%   │       │    │    return callable(glomit)  and not isinstance(obj, type)                                        
   ... │       │       │    │                                                                                                  
  2124 │  1%   │       │    │def _glom(target, spec, scope):                                                                   
   ... │       │       │    │                                                                                                  
  2126 │  1%   │       │    │    pmap = parent.maps[0]                                                                         
  2127 │  3%   │       │    │    scope = scope.new_child({                                                                     
   ... │       │       │    │                                                                                                  
  2132 │  1%   │       │    │        MODE: pmap[MODE],                                                                         
   ... │       │       │    │                                                                                                  
  2136 │  1%   │       │    │    try:                                                                                          
  2137 │  1%   │       │    │        if type(spec) is TType:  # must go first, due to callability                              
   ... │       │       │    │                                                                                                  
  2142 │  2%   │       │    │        return scope.maps[0][MODE](target, spec, scope)                                           
   ... │       │       │    │                                                                                                  
  2156 │  1%   │       │    │    if type(spec) is str:  # shortcut to make deep-get use case faster                            
  2157 │  3%   │       │    │        return _t_eval(target, Path.from_text(spec).path_t, scope)                                
  2158 │  1%   │       │    │    if isinstance(spec, dict):                                                                    
   ... │       │       │    │                                                                                                  
       ╵       ╵       ╵    ╵                                                                                                  
                     /usr/local/lib/python3.8/collections/__init__.py: % of time =  40.66% out of  11.69s.                     
       ╷       ╷       ╷    ╷                                                                                                  
  Line │Time % │Time % │Sys │                                                                                                  
       │Python │native │%   │/usr/local/lib/python3.8/collections/__init__.py                                                  
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │    │                                                                                                  
   882 │  1%   │       │    │    def __init__(self, *maps):                                                                    
   ... │       │       │    │                                                                                                  
   887 │  2%   │       │    │        self.maps = list(maps) or [{}]          # always at least one map                         
   ... │       │       │    │                                                                                                  
   892 │  2%   │       │    │    def __getitem__(self, key):                                                                   
   893 │  4%   │       │    │        for mapping in self.maps:                                                                 
   894 │  1%   │       │    │            try:                                                                                  
   895 │ 19%   │       │    │                return mapping[key]             # can't use 'key in mapping' with defaultdict     
   896 │  1%   │       │    │            except KeyError:                                                                      
   897 │  4%   │       │    │                pass                                                                              
   ... │       │       │    │                                                                                                  
   933 │  1%   │       │    │    def new_child(self, m=None):                # like Django's Context.push()                    
   ... │       │       │    │                                                                                                  
   937 │  1%   │       │    │        if m is None:                                                                             
   ... │       │       │    │                                                                                                  
   939 │  4%   │       │    │        return self.__class__(m, *self.maps)                                                      
   ... │       │       │    │                                                                                                  
   946 │  1%   │       │    │    def __setitem__(self, key, value):                                                            
   ... │       │       │    │                                                                                                  
       ╵       ╵       ╵    ╵                                                                                                  
                           /usr/local/lib/python3.8/weakref.py: % of time =   2.36% out of  11.69s.                            
       ╷       ╷       ╷    ╷                                                                                                  
  Line │Time % │Time % │Sys │                                                                                                  
       │Python │native │%   │/usr/local/lib/python3.8/weakref.py                                                               
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │    │                                                                                                  
   382 │  1%   │       │    │    def __getitem__(self, key):                                                                   
   383 │  2%   │       │    │        return self.data[ref(key)]                                                                
   ... │       │       │    │                                                                                                  
       ╵       ╵       ╵    ╵                                                                                                  
                             /usr/local/lib/python3.8/abc.py: % of time =   1.71% out of  11.69s.                              
       ╷       ╷       ╷    ╷                                                                                                  
  Line │Time % │Time % │Sys │                                                                                                  
       │Python │native │%   │/usr/local/lib/python3.8/abc.py                                                                   
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │    │                                                                                                  
    98 │  1%   │       │    │            return _abc_instancecheck(cls, instance)                                              
   ... │       │       │    │                                                                                                  
       ╵       ╵       ╵    ╵                                                                                                  
                                 glom/test/perf_report.py: % of time =   1.41% out of  11.69s.                                  
       ╷       ╷       ╷     ╷                                                                                                  
  Line │Time % │Time % │Sys  │                                                                                                  
       │Python │native │%    │glom/test/perf_report.py                                                                          
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │     │                                                                                                  
    47 │  1%   │       │  0% │        Obj(i, 'name' + str(i), 'external' + str(i), 'now') for i in range(num)]                  
   ... │       │       │     │                                                                                                  
       ╵       ╵       ╵     ╵                                                                                                  
kurtbrose commented 3 years ago

getting into the details -- one of the relatively expensive areas is scope[Path] maintenance -- not only is this low performance, but there are some conceptual problems with the way it is set up

I'll probably create a separate issue / branch for cleaning that up and just remove it in this branch

mahmoud commented 3 years ago

So there are a bunch of things going on here and I think we're good to merge at least a few:

  1. Benchmarks
  2. Order in AUTO
  3. Caches for Path and TargetRegistry

I'll mess around with the caches. I think we need to talk a bit about importance/canonicalness of the benchmarks.

Probably revert the gl_eval tweak, as it breaks Inspect. The Path as proto-breadcrumbs could be enhanced as well, but may need its own PR.