Missing data is represented as approx 1.0

jdechalendar / gridemissions

Tools for power sector emissions tracking

MIT License

35 stars 6 forks source link

Is your concern computational efficiency or accuracy of the results?

If accuracy The value of 1.0 was chosen as a "very small" value in the context of the numerical data that were being used. As you correctly pointed out, these values should stay small (if the data that are initially supplied to the algorithm are reasonable). So you can still identify them easily as missing after the cleaning job. By throwing away data that are below say 2.0 after physics-based cleaning, most of these data points will disappear, and you will be able to remove many of these columns because they will have no data.

If computational efficiency Then you are correct that having fewer columns would be more efficient, but that was not the main priority when this code was initially written. Expanding the data structure so that all source/region combinations exist made it easier to write the optimization program. But it should not be very difficult to modify the optimization program code to check whether a combination exists before including it.

jdechalendar / gridemissions

Missing data is represented as approx 1.0 #8