salilab / IHMValidation

Validation software for integrative models deposited to PDB
MIT License
2 stars 2 forks source link

Refactor utility.dict_to_JSlist #39

Closed aozalevsky closed 2 years ago

aozalevsky commented 2 years ago

This is a continuation of the #38

Looks like utility.dict_to_JSlist is heavily used throughout the code and performs a lot of iterations and list comprehensions. https://github.com/salilab/IHMValidation/blob/4df2f34244b03eee635f32e1a744acfb55890531/master/pyext/src/validation/utility.py#L32-L43

Though a list comprehension is quite efficient, it is overused causing a sizable delay.

benmwebb commented 2 years ago

In most cases you probably don't need to use it at all. You can substitute the necessary fields directly into the generated HTML rather than converting them to JS and then using JS to do HTML-load-time checks on them. For example, no need to store bond outliers in a JS list - just make a simple HTML table directly from the Python data structure. See #19.

aozalevsky commented 2 years ago

Actually, this function is not directly related to JS. Rather it "transposes" dict to a set of rows with every dict key being a column. Also, it substitutes ? to _.

>>> d = {'a': ['1', '2'], 'b': ['?', 1]}
>>> dict_to_JSlist(d)
>>> [['a', 'b'], ['1', '_'], ['2', '1']]

So far I prepared drop-in replacement which typically takes only several microseconds even for large entries:

       21    0.007    0.000    0.008    0.000 utility.py:47(dict_to_JSlist)

the actual code:

def dict_to_JSlist(d: dict) -> list:                                                                                                                                                                        
    '''                                                                                                                                                                                                     
    convert dictionary to list of lists                                                                                                                                                                     
    '''                                                                                                                                                                                                     
    output_list = []                                                                                                                                                                                        

    if bool(d) and len(list(d.keys())) > 0:                                                                                                                                                                 
        # add headers for table, which are the keys of the dict                                                                                                                                             
        header = list(d.keys())                                                                                                                                                                             
        N = len(header)                                                                                                                                                                                     
        M = len(d[header[0]]) + 1                                                                                                                                                                           
        output_list = np.empty((M, N), dtype=object)                                                                                                                                                        
        output_list[0, :] = header
        # iterate over dict keys - columns                                                                                                                                                                          
        for j, v in enumerate(d.values()):
            # iterate over values of every key - fill rows                                                                                                                                                                 
            for i, el in enumerate(v, start=1):                                                                                                                                                             
                el_ = str(el)                                                                                                                                                                               
                if el_ == '?':                                                                                                                                                                              
                    el_ = '_'                                                                                                                                                                               
                output_list[i, j] =  el_                                                                                                                                                                    

        output_list = output_list.tolist()                                                                                                                                                                  

    return output_list                           

It it's ok, I'll create pull request.

benmwebb commented 2 years ago

How odd! Sure, in that case a simple refactor sounds fine to me.