Closed oayana closed 2 years ago
HI @oayana, thanks for opening this issue :) What script did you use? Did it lead to an error?
We splitted at some point one line that had voltage information in this format {110,220) to two lines 110 and 220. So probably we missed splitting the cable as well. I think we need an alternative solution compared to the workaround. Do you want to create a PR and think about a solution?
First of all, you're welcome :)
While I was performing the analysis for Turkey, I got such an error on line 368 of the osm_data_cleaning script.
Error code:
Traceback (most recent call last):
File "C:\Users\muh1\Desktop\pypsa\Pypsa_data\pypsa-africa.snakemake\scripts\tmpb0bbyowa.osm_data_cleaning.py", line 611, in
I tried a workaround as a solution. But if you want to find a better solution, of course I would like to help.
Hi @oayana, Thanks for the error message. So it seems in line 371 is the problem.... Your workaround is currently fixing the bug but introduces a small error. You are setting two lines with i.e. 'cable' = 2, but one line need to have 'cable'=3. Let me explain.
A cable can be only an integer (1,2,3,...). The line with the trouble has 2 lines in parallel running. One with 2 cables one with 3 cables. This indeed lead to the error that (2;3) cannot be converted to an integer.
As you can see below, we splitted voltages because there was also a semicolon.
Split cells function is used in the voltage case to break the lines apart. But let's check out the split_cells
function in more detail
https://github.com/pypsa-meets-africa/pypsa-africa/blob/509d2ba691c622c2245df63631c7f09c91089355/scripts/osm_data_cleaning.py#L238-L244
What the docstring explains is that we separate the voltage in this function and create an identical line with all previous data. So while we have now two lines with separated voltage without semicolon, we still have the problem that the cable information is not fixed - it has now in each line smt like Line1['cable'] -> (2;3), Line2 ['cable'] -> (2:3)
Solution:
Goal (simplified):
That's how data was stored:
Line['voltage'] = (100;200)
Line['cable'] = (1;2)
That's how we need it:
Line1['voltage'] = (100)
Line1['cable'] = (1)
Line2['voltage'] = (200)
Line2['cable'] = (2)
Do you want to be on the contributor list with such a fix? :)
I understood the problem and would like to offer you a solution. I will handle it as soon as possible and get back to you :)
- semicolon
Hi Max, I had a chance to look at the df_all_lines dataframe now and as you mentioned some cables and voltages values are semicolon. However, both cables and voltages are not semicolon in the same row(only one of them is semicolon).
You have already produced the solution for this. It is solved with the split_cells function, as you threw into the voltages values. All you have to do is call the split function before parsing the cables values into int:
df_all_lines = split_cells(df_all_lines, lst_col="cables")
Then, df_all_lines["cables"] = df_all_lines["cables"].astype("int")
Hi @oayana, I would suggest following these steps 1-7 to create a pull request that can be accepted in PyPSA-Africa (it also helps to check your code and to stream it in): https://www.tomasbeuzen.com/post/git-fork-branch-pull/
I have only one doubt which we need to check to be ready for an accepted PR.
Let's assume one line has a semicolon separated values for voltage and cable at the same time:
Line['voltage'] = (100;200)
Line['cable'] = (1;2)
Then, I believe applying split cells
two times could lead to the following:
Line1['voltage'] = (100)
Line1['cable'] = (1)
Line2['voltage'] = (100)
Line2['cable'] = (2)
Line3['voltage'] = (200)
Line3['cable'] = (1)
Line4['voltage'] = (200)
Line4['cable'] = (2)
Instead of the desired solution which I mentioned above (https://github.com/pypsa-meets-africa/pypsa-africa/issues/153#issuecomment-974478190). I would suggest writing a small jupyterscript to test if a one line with the semicolon in both cases can be solved by the split cell function correctly.
Almost there :)
Hi @oayana, I would suggest following these steps 1-7 to create a pull request that can be accepted in PyPSA-Africa (it also helps to check your code and to stream it in): https://www.tomasbeuzen.com/post/git-fork-branch-pull/
I have only one doubt which we need to check to be ready for an accepted PR.
Let's assume one line has a semicolon separated values for voltage and cable at the same time:
Line['voltage'] = (100;200) Line['cable'] = (1;2)
Then, I believe applying
split cells
two times could lead to the following:Line1['voltage'] = (100) Line1['cable'] = (1) Line2['voltage'] = (100) Line2['cable'] = (2) Line3['voltage'] = (200) Line3['cable'] = (1) Line4['voltage'] = (200) Line4['cable'] = (2)
Instead of the desired solution which I mentioned above (#153 (comment)). I would suggest writing a small jupyterscript to test if a one line with the semicolon in both cases can be solved by the split cell function correctly.
Almost there :)
Hi Max, Maybe this code can provide the solution you want (be careful it only works for cables and voltage).
import pandas as pd
dic = {'line_id': {0: 557828023, 1: 176542356, 2: 305004661, 3: 233127728}, 'voltage': {0: '154000;380000', 1: '380000', 2: '154000;380000', 3: '154000'}, 'cables': {0: '3;2', 1: '0', 2: '3;6', 3: '0'}}
data = pd.DataFrame(dic) #example data
def split_cells_multiple(df,list_col=['cables','voltage']): # split function for cables and voltage
for i in range(df.shape[0]):
sub = df.loc[i,list_col] # for each cables and voltage
if sub.notnull().all() == True: # check not both empty
if [ ";" in s for s in sub].count(True) == len(list_col): # check both contain ";"
d = [ s.split(';') for s in sub] #split them
r = df.loc[i,:].copy()
df.loc[i,list_col[0]] = d[0][0] # first split [0]
df.loc[i,list_col[1]] = d[1][0]
r[list_col[0]] = d[0][1] # second split [1]
r[list_col[1]] = d[1][1]
df = df.append(r)
return df # return new frame
data = split_cells_multiple(data)
Some cables values (in df_all_lines["cables"]) are not suitable for conversion to int values (eg :3;6).
Workaround: df_all_lines["cables"] = df_all_lines["cables"].astype(str).str.replace(";", ".").astype(float).astype(int) You can use.