wolverton-research-group / qmpy

A suite of computational materials science tools.
https://oqmd.org
MIT License
129 stars 45 forks source link

Bug when parsing element_set #136

Open joabro opened 2 years ago

joabro commented 2 years ago

Hi

I have noticed that an error occurs when providing an element_set that includes two species for which one corresponds to the captial letter in the other, e.g. "Nb,N" or "Os,O":


wget "http://oqmd.org/oqmdapi/formationenergy?fields=name,entry_id&filter=element_set=Os,O"
--2022-01-02 14:15:14--  http://oqmd.org/oqmdapi/formationenergy?fields=name,entry_id&filter=element_set=Os,O
Resolving oqmd.org (oqmd.org)... 165.124.29.200
Connecting to oqmd.org (oqmd.org)|165.124.29.200|:80... connected.
HTTP request sent, awaiting response... 400 Bad Request
2022-01-02 14:15:16 ERROR 400: Bad Request.

The underlying reason for this problem is the way that the element_set_conversion function handles the corresponding regular expressions. Specifically, it gives the following output:

>>> filter_expr = 'element_set=Os,O'
>>> print(element_set_conversion(filter_expr))
( element=" element="O" s"  AND  element="O" )
>>> filter_expr = 'element_set=Nb,N'
>>> print(element_set_conversion(filter_expr))
( element=" element="N" b"  AND  element="N" )

This issue can be solved by rewriting the element_set_conversion function as follows:

def element_set_conversion(filter_expr):
    """
    Convert element_set filter to multiple element filters 
    Input: 
        :str filter_expr: raw filter expression w/ element_set parameter
            Valid element_set expression: 
                ',': AND operator
                '-': OR operator
                '~': NOT operator
                '(', ')': to change precedence
            Examples:
                element_set=Al;O,H
                element_set=(Mn;Fe),O
    Output:
        :str : converted filter expression
    """
    filter_expr_out = filter_expr
    for els in re.findall("element_set=[\S]*", filter_expr):
        els_in = els
        els_out = ''
        for el in re.findall("[A-Z][a-z]*", els_in):
            bels, els_in = tuple(els_in.split(el, maxsplit=1))
            els_out += bels + ' element="' + el + '" '
        els_out = els_out.replace("element_set=", "")
        els_out = els_out.replace(",", " AND ")
        els_out = els_out.replace("-", " OR ")

        filter_expr_out = filter_expr_out.replace(els, "(" + els_out + ")")

    return filter_expr_out

With this definition one obtains the expected results:


>>> filter_expr = 'element_set=Os,O'
>>> print(element_set_conversion(filter_expr))
( element="Os"  AND  element="O" )
>>> filter_expr = 'element_set=Nb,N'
>>> print(element_set_conversion(filter_expr))
( element="Nb"  AND  element="N" )
tachyontraveler commented 2 years ago

Fixed: https://github.com/wolverton-research-group/qmpy/pull/137/commits/2d2356c3778fd683d9f45d52ef35219dcd2d363d#diff-67ad97c274669a060d71b464588d181e37f56872d2273063e28b663733859d74

Thanks for reporting the bug, @joabro !