iiasa / ixmp

The ix modeling platform for integrated and cross-cutting scenario analysis
https://docs.messageix.org/ixmp
Apache License 2.0
37 stars 110 forks source link

Scenario.read_excel() fails on sheets containing 'phantom' cells #432

Open OFR-IIASA opened 2 years ago

OFR-IIASA commented 2 years ago

The error occurs when executing read_excel().

When reading an excel file, some rows of data can be added which actually do not contain any data. These extra rows will contain "NaN"s, therefore resulting in two errors.

The first error which occurs, if the argument add_unit=True is passed:

TypeError: No matching overloads found for at.ac.iiasa.ixmp.Platform.addUnitToDB(float,str), options are: public int at.ac.iiasa.ixmp.Platform.addUnitToDB(java.lang.String,java.lang.String) throws at.ac.iiasa.ixmp.exceptions.IxException

The error message results from the fact that a unit "nan" is trying to be added.

The second error occurs when reading data and determining the column type. Should "phantom" data be included in the dataframe, then columns which should be integers are read as floats. The following error is given:

RuntimeError: unhandled Java exception: The key '2001.0' must not contain a '.'!

The current workaround requires cleaning of the xlsx which is being read-in. In the various sheets, columns can be filtered for "Blanks" in excel. The resulting selection will show the respective rows, indicated by having blue row numbers. Deleting these and saving the xlsx file should resolve the issue. In order to see which parameter sheet contains the phantom data, add print(name) in line 274 of the file ixmp/backend/io.py. The parameter sheet currently being processed will be printed on screen. The last parameter printed before the error message will guide the user to the sheet containing the phantom-data.

danielhuppmann commented 2 years ago

pandas has problems with: