wind-python / windpowerlib

The windpowerlib is a library to model the output of wind turbines and farms.
https://oemof.org/
MIT License
307 stars 103 forks source link

Add Data from wind-turbine-models.com #120

Open maurerle opened 2 years ago

maurerle commented 2 years ago

Hello everybody,

I wrote a small script to parse the data from wind-turbine-models.com. As the Turbine-data already contains records from wind-turbine-models.com (https://github.com/wind-python/windpowerlib/blob/master/windpowerlib/oedb/turbine_data.csv) I hope that the legal issues discussed in https://github.com/OpenEnergyPlatform/data-preprocessing/issues/28#issuecomment-808275889 are cleared and the data can be used.

It would be very good if the power curves could be additionally integrated into the OEP-Database.

The below code is available under the MIT License and free for anyone to use:

from bs4 import BeautifulSoup # parse html
import requests
import json5 # parse js-dict to python
import json
import pandas as pd
from tqdm import tqdm # fancy for loop

# create list of turbines with available powercurves
page = requests.get('https://www.wind-turbine-models.com/powercurves')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='chosen-select')

wind_turbines_with_curve = []
for i in name_list.find_all('option'):    
    wind_turbines_with_curve.append(i.get('value'))

def downloadTurbineCurve(turbine_id,start = 0, stop=25):
    url = "https://www.wind-turbine-models.com/powercurves"
    headers = dict()
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    data = {'_action': 'compare', 'turbines[]': turbine_id, 'windrange[]': [start, stop]}

    resp = requests.post(url, headers=headers, data=data)
    strings = resp.json()['result']
    begin = strings.find('data:')
    end = strings.find('"}]', begin)
    relevant_js = '{'+strings[begin:end+3]+'}}'
    curve_as_dict = json5.loads(relevant_js)
    x = curve_as_dict['data']['labels']
    y = curve_as_dict['data']['datasets'][0]['data']
    label = curve_as_dict['data']['datasets'][0]['label']
    url = curve_as_dict['data']['datasets'][0]['url']
    df = pd.DataFrame(y, index=x, columns=[label])
    df.index.name = 'wind_speed'
    return df

curves = []
for turbine_id in tqdm(wind_turbines_with_curve):
    curve = downloadTurbineCurve(turbine_id)
    curves.append(curve)
c = pd.concat(curves,axis=1)
d = c[c.any(axis=1)]

with open('down.csv','w') as f: 
    d.to_csv(f)
Ludee commented 2 years ago

Dear @maurerle, thanks for pushing this collaborative database. From my point of view, the scraping and collecting of the website is a grey area. It can be considered legal, if only a part is beeing collected.

Im Regelfall ist Web Scraping für die empirische Forschung rechtlich zulässig. Die Nutzungsbedingungen, die häufig verwendet werden, ändern daran nichts. Anders sieht es mit technischen Sperren aus, die nicht umgangen werden dürfen. Wer sicher gehen will, kann den Hersteller der Datenbank um Erlaubnis fragen und sich diese – am besten in Textform (zum Beispiel per E-Mail) – geben lassen. In Zweifelsfällen beraten die Rechtsabteilungen der Forschungseinrichtungen.

Grenzen des "Web Scrapings" - https://www.forschung-und-lehre.de/recht/grenzen-des-web-scrapings-2421/

But the publication under an open license is definitely not possible. I contacted the website owners some time ago but there was no interest in collaboration.

This is why I started to gather the original sources and start a new open database under an appropriate open data license!