nhcb / entsog-py

Python client for the ENTSOG API
MIT License
18 stars 4 forks source link

Data no longer in DataFrame format #12

Closed lucasdlf closed 1 year ago

lucasdlf commented 1 year ago

Hi, Since I updated to entsog-py 1.0.2 the downloaded data cannot be correctly formatted by Pandas and remains as a long string. This was not the case with the previous versions. Lucas

nhcb commented 1 year ago

Hi Lucas, Could you share exactly what endpoints you are using to get your data? Also share some example code so I might have a look. On my end it's working correctly, but I don't use all endpoints/options.

lucasdlf commented 1 year ago

Thank you nhcb for the quick answer! A simplified version of my code is below. There you can see I already incorporated some of the changes to query_operational_data(), but it overall looks the same to the version I had before. The variable 'data' used to be a dataframe that I could manipulate accordingly, but now it is reassigned to a long str. I have tried some parsing methods but nothing seems to help. Thanks, Lucas

`import pandas as pd from entsog import EntsogPandasClient

start = pd.Timestamp('20220901', tz='Europe/Brussels') end = start + pd.DateOffset(days=2) data = pd.DataFrame()

client = EntsogPandasClient() data, url = client.query_operational_data(start=start, end=end, indicators=['physical_flow']) `

nhcb commented 1 year ago

I believe the issue is that you have

data, url = client.query_operational_data(start=start,
end=end,
indicators=['physical_flow'])

instead of

data = client.query_operational_data(start=start,
end=end,
indicators=['physical_flow'])
lucasdlf commented 1 year ago

``Hi again, I've compared the latest version of entsog.py (1.0.2) with the one I was using privately (0.9.0) and realized that the function query_operational_data() is no longer present for the EntsogPandasClient. This means that the program is actually running the version of query_operational_data() written for EntsogRawClient and returning data in JSON format or something similar. This is why pandas could not parse it.

I just finished a hot fix by adding the older version of query_operational_data() to EntsogPandasClient:

def query_operational_data(self,
                               start: pd.Timestamp,
                               end: pd.Timestamp,
                               country_code: Union[Area, str],
                               period_type: str = 'day',
                               indicators: Union[List[Indicator], List[str]] = ['physical_flow'],
                               verbose: bool = True) -> pd.DataFrame:

        area = lookup_area(country_code)
        operators = list(area.value)

        frames = []
        for operator in operators:
            try:
                frame = self._query_operational_data(
                    start=start,
                    end=end,
                    operator=operator,
                    period_type=period_type,
                    indicators=indicators,
                    verbose=verbose)
                frames.append(frame)
            except Exception as e:
                print(f"Failure on operator {operator}: {e}")

        result = pd.concat(frames)

        return result

and modifying the EntsogRawClient version of the same function to this, so it could lookup operator by countries:

   def query_operational_data(self,
                               start: pd.Timestamp,
                               end: pd.Timestamp,
                               period_type: str = 'day',
                               indicators: Union[List[Indicator], List[str]] = None,
                               point_directions : Optional[List[str]] = None,
                               operator : Optional[str] = None,
                               offset : int = None,
                               ) -> str:

        params = {
            'from': self._datetime_to_str(start),
            'to': self._datetime_to_str(end),
            'periodType': period_type
        }

        if operator is not None:
            params['operatorKey'] = operator

        if offset is not None:
            params['offset'] = offset
            params['limit'] = OFFSET

        if indicators is not None:
            decoded_indicators = []
            for indicator in indicators:
                decoded_indicators.append(lookup_indicator(indicator).code)

            params['indicator'] = ','.join(decoded_indicators)

        if point_directions is not None:
            params['pointDirection'] = ','.join(point_directions)

        if operator is not None:
            params['operator'] = ','.join(operator)

        response = self._base_request(endpoint='/operationaldatas', params=params)

        return response.text, response.url

Hopefully my changes are clear and can help others.

EPRINC-MP commented 1 year ago

Greetings, I have been running entsog-py 1.0.1 successfully for the last several months. These last few days, it began to exhibit problems.

Running a test in Python, the following lines: from entsog import EntsogPandasClient import pandas as pd client = EntsogPandasClient() fromDate = pd.Timestamp('2020-07-01', tz='Europe/Brussels') endDate = pd.Timestamp('2020-07-31', tz='Europe/Brussels') opDataDF=client.query_operational_data_all(start = fromDate, end = endDate, indicators = ['renomination', 'physical_flow'])

produce this error: /usr/lib/python3.11/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'transparency.entsog.eu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings warnings.warn( /usr/lib/python3.11/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'transparency.entsog.eu'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings warnings.warn( Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.11/site-packages/entsog/decorators.py", line 144, in day_wrapper frame = func(*args, start=_start, end=_end, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Any suggestions?

nhcb commented 1 year ago

Hi eprinc-mp,

As I suggested in some other issues, you'll need to update the package to version 1.0.2. It seems ENTSOG have done some internal update which renders it impossible to send a verified (through https) request to the server using python. That's why I solved it with a tempory fix, which is:

requests.get(url, verify = False)

From your error message I can tell you have successfully updated to 1.0.2. as it's producing warnings rather than errors. These warnings are to tell you an unverified https request is being made; which is basically the temporary fix for V1.0.2. (the verify = false).

In short, you don't have any errors but rather warnings you can ignore until it's fixed in the next version.

EPRINC-MP commented 1 year ago

Thank you for your reply and effort.

But in the end, no data is being returned with Version 1.0.2; is that correct?

Max

nhcb commented 1 year ago

Depends, you're requesting a month of data; which could take about 30 seconds to a minute I believe. Since you are also requesting two parameters, it might take a bit longer. I have no issues on my end running your script. It just outputs the warning every time it makes a request. I might hide it in a future version, but you can also manually hide it using:

import urllib3
urllib3.disable_warnings()