sissaschool / elementpath

XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml
MIT License
72 stars 20 forks source link

[feature] tostring python function #70

Closed Constantin1489 closed 8 months ago

Constantin1489 commented 1 year ago

Thank you guys for the great repo.

I want to suggest a Python function like this.

def elementpath_tostring(obj) -> str:                                                                                                                                                                                                                 
    """                                                                                                                                                                                                                                        
    # original in this repo : https://github.com/sissaschool/elementpath/blob/dfcc2fd3d6011b16e02bf30459a7924f547b47d0/elementpath/xpath_tokens.py#L1038                                                                                                               
    """                                                                                                                                                                                                                                        

    import elementpath                                                                                                                                                                                                                         
    from decimal import Decimal                                                                                                                                                                                                                
    import math                                                                                                                                                                                                                                

    if obj is None:                                                                                                                                                                                                                            
        return ''                                                                                                                                                                                                                              
    elif isinstance(obj, elementpath.XPathNode):                                                                                                                                                                                               
        return obj.string_value                                                                                                                                                                                                                
    elif isinstance(obj, bool):                                                                                                                                                                                                                
        return 'true' if obj else 'false'                                                                                                                                                                                                      
    elif isinstance(obj, Decimal):                                                                                                                                                                                                             
        value = format(obj, 'f')                                                                                                                                                                                                               
        if '.' in value:                                                                                                                                                                                                                       
            return value.rstrip('0').rstrip('.')                                                                                                                                                                                               
        return value                                                                                                                                                                                                                           

    elif isinstance(obj, float):                                                                                                                                                                                                               
        if math.isnan(obj):                                                                                                                                                                                                                    
            return 'NaN'                                                                                                                                                                                                                       
        elif math.isinf(obj):                                                                                                                                                                                                                  
            return str(obj).upper()                                                                                                                                                                                                            

        value = str(obj)                                                                                                                                                                                                                       
        if '.' in value:                                                                                                                                                                                                                       
            value = value.rstrip('0').rstrip('.')                                                                                                                                                                                              
        if '+' in value:                                                                                                                                                                                                                       
            value = value.replace('+', '')                                                                                                                                                                                                     
        if 'e' in value:                                                                                                                                                                                                                       
            return value.upper()                                                                                                                                                                                                               
        return value                                                                                                                                                                                                                           

    return str(obj)  

Because the selector function returns various types, I want to ensure the function I suggest won't break the context below in general and with future elementpath updates.

    """
    PR of another repo: https://github.com/dgtlmoon/changedetection.io/pull/1774
    """
    #tree = etree.HTML(html_content)
    #tree = etree.XML(html_content)
    r =  elementpath.select(tree, xpath)
    if type(r) != list:
        r = [r]
    html_block = ''
    for element in r:                                                                                                                                                                                                                          

        if type(element) == str:                                                                                                                                                                                                               
            html_block += element                                                                                                                                                                                                                                                                                                                                                                                                          
        # https://lxml.de/api/lxml.etree-module.html#tostring                                                                                                                                                                                  
        # https://lxml.de/api/lxml.etree._Element-class.html                                                                                                                                                                                   
        # https://lxml.de/api/lxml.etree._ElementTree-class.html                                                                                                                                                                               
        elif issubclass(type(element), etree._Element) or issubclass(type(element), etree._ElementTree):                                                                                                                                       
            html_block += etree.tostring(element, pretty_print=True).decode('utf-8')                                                                                                                                                           
        else:                                                                                                                                                                                                                                  
            html_block += elementpath_tostring(element)

If you think the elementpath_tostring(obj) is good for PR, let me know, please. Thank you.

brunato commented 1 year ago

Hi,

sorry for the late response. I've checked your proposed elementpath_tostring(obj) function against the original one (XPathToken.string_value). The only difference is on the part related to an object that is an XPathFunction, that can be skipped in your code using a not isinstance(obj, elementpath.XPathToken).

So the essential problem, if i understood well, is to access string_value() function in a simple way. This can be done creating a token instance from a simple valid XPath expression (e.g. '.') but i imagine that could be better done with a specific package API.

The new API could be a sort of accessor to some/all token related helpers, without touching the token code. I've to think about that, but this could be a useful generic feature for this package.

What do you think of a new feature like this?

Constantin1489 commented 1 year ago

Yes, that is what I expect! If the new API function is managed by the author directly, that is the best!

At least, the result I want is changing various types of the result of elementpath.select to string type! This will allow combining results as a string and easy to send the data as a string and so on.

Without the general string conversion, each time unexpected XPath query occurs, that may break. (this is my disaster scene https://github.com/dgtlmoon/changedetection.io/pull/1774/commits/f6b763cf27f3ebb689779961d798eb445fde3261)

brunato commented 9 months ago

Hi, the latest minor release includes a new method get_function() for XPath parser classes, that can create a callable function object, e.g.:

>>> from elementpath import XPath2Parser
>>> parser = XPath2Parser()
>>> fn_string = parser.get_function('fn:string', arity=1)
>>> fn_string(89)
'89'
>>> fn_string(89.9)
'89.9'
>>> fn_string(True)
'true'

this feature requires further development, because currently the arguments are processed as is and maybe it could be desirable to have a dynamic context connected to the function object. In fact the function fn_string in the above example is able to get the string value of an element only if it's wrapped by a node tree:

>>> import xml.etree.ElementTree as ET
>>> root = ET.XML('<root>\n  <child>one</child>\n</root>')
>>> fn_string(root)
"<Element 'root' at 0x7f3c18490ae0>"
>>>
>>> from elementpath import get_node_tree
>>> fn_string(get_node_tree(root))
'\n  one\n'
brunato commented 8 months ago

Hi, the release v4.4.0 has an improved version of parser method get_function(). Furthermore the XPathFunction class has been extended with the automatic conversion of Element objects to element node, with or without a context (better to provide a context anyway).

Also: an XPathFunction object can wrap itself in a function using the method as_function(). I think this can be enough to consider this issue resolved, but I leave this option to you after trying the new features.

Best regards

Constantin1489 commented 8 months ago

Yes, you're right. Since the internal API is provided, therefore this issue is solved. (Now, the issue is my procrastination... I'm really sorry..)

Thank you! I appreciate it for your help!