powsybl / pypowsybl

A PowSyBl and Python integration based on GraalVM native image
Mozilla Public License 2.0
53 stars 10 forks source link

overcoming some python strict typing problems #298

Open PatrickBourdon opened 2 years ago

PatrickBourdon commented 2 years ago
from enum import Enum, auto # https://docs.python.org/3/library/enum.html
import gzip # https://docs.python.org/3/library/gzip.html
from os.path import abspath # https://docs.python.org/3/library/os.path.html
from pathlib import Path # https://docs.python.org/3/library/pathlib.html
from typing import Union, Any # https://docs.python.org/3/library/typing.html

from pandas import DataFrame # https://pandas.pydata.org/docs/reference/frame.html

import _pypowsybl # (non recommended hack) https://www.powsybl.org/pages/documentation/grid/model/
import pypowsybl.network as pn # https://pypowsybl.readthedocs.io/en/stable/reference/network.html

class ElementType(Enum):
    BATTERY = auto()
    BUS = auto()
    BUSBAR_SECTION = auto()
    CURRENT_LIMITS = auto()
    DANGLING_LINE = auto()
    GENERATOR = auto()
    HVDC_LINE = auto()
    LCC_CONVERTER_STATION = auto()
    LINE = auto()
    # ... and all others ...

class Network(pn.Network):

    def __init__(self, path: Path, parameters: Union[None, dict[str, Any]]=None) -> None:
        """ 
        Loads a pypowsybl network from a file:
        - path (str): the path of the file to load
        - parameters (dict, optional): a map of parameters

        NOTE 1: The format of the network description is discovered from the file extension (e.g. .xiidm)
        NOTE 2: If the file is gzip-compressed (e.g. .xiidm.gz), the network is loaded without an intermediate decompressed file.
        """
        print(f'Loading network from path({path}) ...')
        if path.suffix.lower() == '.gz':
            with gzip.open(abspath(path), 'rt', encoding='utf_8_sig') as text_io:
                super().__init__(_pypowsybl.load_network_from_string(path.stem, text_io.read(), parameters or {})) # type: ignore
                # '.gz' extension has to be removed, otherwise the network format isn't recognized 
                # and the following exception is raised: 
                #   _pypowsybl.PyPowsyblError: Unsupported file format or invalid file.
        else:
            super().__init__(_pypowsybl.load_network(abspath(path), parameters or {})) # type: ignore
        print(str(self))
        return

    def get_elements(self, element_type: ElementType) -> DataFrame:
        """
        Overloads and calls the same method of the superclass pn.Network.
        The difference is the use of the locally defined enumerated 'ElementType'.
        """
        pn_element_type = next((x for x in pn.ElementType.__members__.values() if x.name == element_type.name)) # type: ignore
        return super().get_elements(pn_element_type) # type: ignore

    def elements_count(self) -> DataFrame:
        """ 
        Returns a dataframe with the elements count of each ElementType.
        """
        df = DataFrame({'count': [len(self.get_elements(x)) for x in ElementType.__members__.values()] },
                      index=list(ElementType.__members__))
        df.index.name = 'ElementType'
        return df

def main() -> None:
    network: Network = Network(Path('20220101T0000Z_20220101T0000Z_pf.xiidm.gz'))
    print(network.elements_count())
    lines: DataFrame = network.get_elements(ElementType.LINE)
    print(lines)
    return

@main()

1/ Why ElementType(Enum)? pn.ElementType or pn.ElementType.BUS are both accessible respectively as <class '_pypowsybl.ElementType'> and <ElementType.BUS: 0>. However the definition of the latter is not accessible to the type checker and an error is reported when using any element type such as: df: DataFrame = network.get_elements(pn.ElementType.BUS)

2/ Why pn.Network subclass? To overload get_elements using the ElementType(Enum) without typing errors, whilst still using any other pypowsybl.network methods. The new get_elements wraps (once for all) the typing errors, hidden by # type: ignore. elements_count is a basic example using get_elements: it reports the size of the loaded network.

Note: get_elements is NOT described in the documentation but it is a public method. update_elements also has a pn.ElementType parameter and it is part of the documentation.

3/ About __init__: The method of the created subclass adds a feature: direct read of a .xiidm.gz. The method wraps and hides the typing errors from load_network_from_string and load_network.

The typing errors come from their signatures: parameters: dict = {}. dict alone misses some information. It should probably be something like dict[str, Any].

Note that Using a dict as a default parameter is not recommended: but this is beyond typing. https://docs.quantifiedcode.com/python-anti-patterns/correctness/mutable_default_value_as_argument.html pylint notifies this as an error in strict mode. The recommended way is probably parameters: Union[None, dict[str, Any]]=None (python 3.9) parameters: None | dict[str, Any] = None (python 3.10)

4/ The file network.py includes the class pn.Network and also reads:

from typing import List as _List # (1)
from typing import Set as _Set # (2)
from pandas import DataFrame as _DataFrame # (3)

Use oftypint.List (1) et typint.Set (3) are deprecated since 3.9: they are now just list and set that nobody wants to make private (I am unsure if the new way to use them is available from python 3.7). https://docs.python.org/3/library/typing.html#typing.List https://docs.python.org/3/library/typing.html#typing.Set

From the typing point-of-view, the types of parameters and result of any public method MUST be public. Otherwise the caller cannot type his variables, and a type checker in strict mode complains. A signature of a public method like def my_method(self) -> _DataFrame: is strange by essence, for it says that the result is a private type. However the good news is that the type checker understands:

from pandas import DataFrame
df: DataFrame = network.my_method()

since imported DataFrame == np._DataFrame. Therefore (3) only prevents from writing from np import DataFrame, which would be anyway harmless.

Why do you make private systematically every imported object, such as : import datetime as _datetime?

sylvlecl commented 2 years ago

Thanks for the detailed analysis!

Just a quick answer to this part:

Why do you make private systematically every imported object, such as : import datetime as _datetime?

The purpose is to have clean namespaces: for example if we import datetime in pypowsybl.network, it will appear as being part of our own API, whereas it is not (you will never want to do from pypowsybl.network import datetime, and you don't want your IDE to propose it with autocompletion, or it will be messy to use).

For now, we try to keep our namespaces tidy by importing other APIs with "private" scope. Maybe a more powerful solution, which seems broadly used, would be to move our implementations into inner modules, and build our public namespaces by importing in them only those pieces that we want to expose.

For example we could:

This should enable us to keep namespaces tidy, while not making imports private.

sylvlecl commented 2 years ago

The work in #306 and #307 should solve some of the typing problems you reported:

Those improvements will be included in next release.

PatrickBourdon commented 2 years ago

👍

sylvlecl commented 2 years ago

Hi @PatrickBourdon

Did you have any opportunity to check if typing problems have indeed been solved in the last releases of pypowsybl, or if some remain ?

sylvlecl commented 2 years ago

Fixed additional missing types for __members__, following your mail feedback: #396