osm-fr / osmose-backend

Part of osmose that runs the analysis, and send the results to the frontend.
GNU General Public License v3.0
87 stars 109 forks source link

Add analyser for Italian charging locations #2244

Open opk12 opened 3 weeks ago

opk12 commented 3 weeks ago

The Italian community discussed this dataset of charging locations, which is updated regularly and derived from the government data. The Italian community assumes CC-BY-4-equivalent licensing, to be safe.

Osmose can

  1. Show where the CSV has a charging location and OSM does not.
  2. Show where OSM has a charging location and the CSV does not.
  3. Import the foreign key ref:EU:EVSE. Import source, source:date.
  4. Something else?

Thank you @aborruso for improving and maintaining the dataset on OnData. Thank you @cascafico for help with understanding the dataset and for independent verification of its quality.

Additional information The CSV is missing many (surveyed) charging locations. This should improve over time, as more companies submit their locations to the gov.

Point 2 helps against some uncooperative mass-importers of non-existent charging locations, that were discussed here.

As the dataset lacks an explicit license, Italian law assigns open by default, which Italian law constrains to at most attribution, at least gratis use (even gratis commercial use). More details at Wikipedia.

You can find the original gov dataset's URL here in the transformation script. I think the gov URL's shape looks more prone to breakage than the GitHub URL. That script is explained here.

Keywords - rete_ricarica_veicoli_elettrici_cleaned, rete_ricarica_veicoli_elettrici, rete ricarica veicoli elettrici, colonnine di ricarica elettrica per automobili (auto) elettriche.

opk12 commented 3 weeks ago

I am not an expert at mapping charging locations. I used visidata, especially the Frequency table menu item.

POI status

Where stato is in OUTOFORDER, INOPERATIVE, BLOCKED, RESERVED, I propose that Osmose either maps to disused:... or is silent.

CSV-to-OSM mapping cardinality

I think that id_evse is the primary key (has no duplicates) and maps to ref:EU:EVSE.

The column name numero_connettori means number of connectors.

Group types

Records can have the same id_evse from the beginning until the last asterisk, and the number after the last asterisk is different. I guess that that number identifies connectors on the same object. 22 groups are larger than 4 records, which seems wrong.

Records can have the exact same coordinates x, y. I guess this means connectors on the same object. 192 groups are larger than 4, which seems wrong. For example, 36 records have x, y == '8.02017100000006', '43.925331' and 194 records have x, y == '14.2848320000001', '40.858953'.

Grouping by ref or by coordinates returns different groups. I'm not sure which one is better.

Groups example

For example, I'm copy-pasting 3 rows. They have refs IT*BEC*EH000026*1; IT*BEC*EH000026*2; IT*BEC*EH000026*3, exact same x, y, but different last update date (data_ultimo_aggiornamento) and different power (potenza_erogabile).

x,y,objectid,id_location,nome_location,indirizzo,regione,citta,codice_postale,provincia,orario_d_apertura,id_univoco_evse,id_evse,stato,numero_connettori,id_connettore,standard_del_connettore,formato_del_connettore,tipologia_di_alimentazione,voltaggio_massimo,amperaggio_massimo,potenza_erogabile,stato_di_pubblicazione,longitudine_evse,latitudine_evse,capabilities,data_ultimo_aggiornamento,data_di_caricamento,fasce_di_potenza,comune,regione_cleaned,comune_cleaned,codice_comune_formato_alfanumerico
2.5954570000001,44.0550230000001,17523,0106aa9b-f26b-431c-9a6f-5eb2d415b10c,L.mare G. di Vittorio Ang. Città delle Donne,L.mare G. di Vittorio Ang. Città delle Donne,Emilia-Romagna,Rimini,47921,Rimini,Aperto 24/7,0024324f-feee-4f02-aafb-76a9c55779da,IT*BEC*EH000026*2,AVAILABLE,1,ac563b36-7740-41b7-84bc-f0fa622535a3,IEC_62196_T2_COMBO,CABLE,DC,550,200,110000,PUBLISHED,12.59546,44.05502,"REMOTE_START_STOP_CAPABLE, RFID_READER",1709834285000,1711383890000,Ultraveloce o Ultra-fast,Rimini,Emilia-Romagna,Rimini,099014
12.5954570000001,44.0550230000001,17521,0106aa9b-f26b-431c-9a6f-5eb2d415b10c,L.mare G. di Vittorio Ang. Città delle Donne,L.mare G. di Vittorio Ang. Città delle Donne,Emilia-Romagna,Rimini,47921,Rimini,Aperto 24/7,aa76dcb4-bf1f-457f-9843-f0b4f13c4e43,IT*BEC*EH000026*3,AVAILABLE,1,c8ba808c-6895-4a77-9874-7d18ece11e19,IEC_62196_T2,CABLE,AC_3_PHASE,400,32,22144,PUBLISHED,12.59546,44.05502,"REMOTE_START_STOP_CAPABLE, RFID_READER",1709824107000,1711383890000,Veloce o Fast,Rimini,Emilia-Romagna,Rimini,099014
12.5954570000001,44.0550230000001,17524,0106aa9b-f26b-431c-9a6f-5eb2d415b10c,L.mare G. di Vittorio Ang. Città delle Donne,L.mare G. di Vittorio Ang. Città delle Donne,Emilia-Romagna,Rimini,47921,Rimini,Aperto 24/7,bc5e4aa2-65b2-47be-97c4-67170732f5fb,IT*BEC*EH000026*1,AVAILABLE,1,951f88f0-c8d3-4d9f-b468-0ce1a3c05c1b,IEC_62196_T2_COMBO,CABLE,DC,550,200,110000,PUBLISHED,12.59546,44.05502,"REMOTE_START_STOP_CAPABLE, RFID_READER",1709831904000,1711383890000,Ultraveloce o Ultra-fast,Rimini,Emilia-Romagna,Rimini,099014

Groups code

This example finds the groups.

import pandas as pd

def until_last_asterisk(self: str) -> str:
    return self[:self.rindex('*')]

csv = pd.read_csv('rete_ricarica_veicoli_elettrici_cleaned.csv', dtype=str)

# Group by the prefix of id_evse and by the couple of coordinates.
csv['base_of_id_evse'] = csv.id_evse.map(until_last_asterisk)
csv['xy'] = csv[['x', 'y']].apply(tuple, axis=1)

# Groups with more than 100 entries, more than 4 entries, ...
[group for group in map(tuple, csv.groupby('xy').groups.values()) if len(group) > 100]
[group for group in map(tuple, csv.groupby('base_of_id_evse').groups.values()) if len(group) > 4]

x1, y1 = '14.2848320000001', '40.858953'
csv.loc[(csv.x == x1) & (csv.y == y1)]  # 194 rows

Foreign key

The ref:EU:EVSE syntax is like ref:EU:EVSE=DE*EBW*E4032*1; DE*EBW*E4032*2; DE*EBW*E4032*3, from the first example here. Note that the semicolon is followed by a space.

import itertools

def until_last_asterisk(self: str) -> str:
    return self[:self.rindex('*')]

test_group = ['XX*F2X*ABCDITO1234*1', 'IT*F2X*EF2XITO1051*1', 'IT*F2X*EF2XITO1051*2']
['; '.join(group) for _, group in itertools.groupby(test_group, until_last_asterisk)]

Coordinates

The coordinates columns are x, y and latitudine_evse, longitudine_evse.

Feature tag

I propose amenity=charging_station rather than man_made=charge_point, because Tag:amenity=charging_station and man_made=charge_point say that

Disclaimer: I'm not an expert at mapping charging locations.

source

source = https://www.piattaformaunicanazionale.it/

The column name data_ultimo_aggiornamento means date of the last update, data_di_caricamento means date of publishing.

Maybe something like

# Dividing by 1000, because the format seems to be milliseconds from the Unix epoch.
print(datetime.date.fromtimestamp(int("1709834285000") / 1000).strftime("%Y-%m-%d"))
# Output:
#   2024-03-08
frodrigo commented 3 weeks ago

We have already one setup for charging location in France. Where the situation is very similar.

@nlehuby @PanierAvide @flacombe or other are you interested to work in this ?

opk12 commented 3 weeks ago

For clarity: I am not asking to integrate 15-20 possible OSM tags from the CSV fields, to lower the barrier for contributors and to follow the license.

The big work required to do that can be done at a future time, if / when there is an explicit OSM-compatible license or a waiver. I can share my findings on the other fields, if needed. For now, just showing markers at the correct locations would be a big help for surveying the features on the ground, and for handling potential vandalism.

frodrigo commented 3 weeks ago

when there is an explicit OSM-compatible license

I not understand the license is not compatible with OSM at first read. If so, I suggest to do nothing, even not use the locations. I think pointing missing data in OSM may be arguable. But I prefer stay on a clean way and not arguable line of conduct.