switch-model / switch

A Modern Platform for Planning High-Renewable Power Systems
http://switch-model.org/
Other
130 stars 85 forks source link

Increase input files extensions #100

Open pesap opened 6 years ago

pesap commented 6 years ago

Hey @josiahjohnston

I was wondering if we could add more extensions for the input files such as .csv, .tsv, etc. I think this will give more flexibility for some users of switch. I can do the pull request for this. It is an easy feature implementation.

bmaluenda commented 6 years ago

Hi

I imagine that by "adding more extensions" you mean not only to accept different extensions in the filename, but also to correctly parse these other data file formats, such as comma-separated values. If that is the case and you are willing to implement it, I say go ahead :) (I would try to review it)

It would be a nice addition, especially considering that most people are used to working with .csv and not with tab-separated values.

pesap commented 6 years ago

Yes. You are right. I meant to parse different file formats. I will do the pull request :) On Tue, Nov 28, 2017 at 9:37 AM Benjamin Maluenda notifications@github.com wrote:

Hi

I imagine that by "adding more extensions" you mean not only to accept different extensions in the filename, but also to correctly parse these other data file formats, such as comma-separated values. If that is the case and you are willing to implement it, I say go ahead :) (I would try to review it)

It would be a nice addition, especially considering that most people are used to working with .csv and not with tab-separated values.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/switch-model/switch/issues/100#issuecomment-347602620, or mute the thread https://github.com/notifications/unsubscribe-auth/ACIqFHxvn4WRUsBT6HYJovIP1zS2TOwTks5s7ETmgaJpZM4QpLnU .

-- Pedro Andrés Sánchez Pérez

josiahjohnston commented 6 years ago

Thanks Benjamin for being a better communicator than me! :)

The use of .tab files was one of the more established paths with the pyomo codebase, and is a hold-over from their initial desire to match the AMPL file formats, although the formats are increasingly diverging. The pyomo DataPortal interface seemed most flexible for our purposes, but it didn't have everything we wanted and we've already written a wrapper around it to provide more features. It can be slow at parsing and assembling data, so it's not ideal, but its also code we don't have to maintain. If you can write support for csv inputs files, that seems dandy; there's a chance DataPortal already supports it in an undocumented way, but I haven't looked into that yet.

As far as .tsv files go, I think they have the same conventions as a .tab file, but a different extension. If I'm correct, then you won't have to write any new code to support tsv; just pass a different file name to DataPortal.load()

If you have to write new code to implement support in general, I'd suggest using pandas for reading files from disk, then stuffing the data into the DataPortal dictionary. Pandas works efficiently with a wide variety of file formats, is fairly well known, and is maintained and expanded by a broad community. We'll need a little bit of glue code to link pandas to DataPortal, but that shouldn't be too lengthy or difficult to maintain. If you look under the hood, a DataPortal object has a massive nested dictionary that stores everything it has read in, and I've read some Pyomo documentation saying you can add to that dictionary directly as long as you follow their conventions. I think some of the code for parsing partial load heat rates already manipulates that dictionary directly.

Best of luck.

https://software.sandia.gov/downloads/pub/pyomo/PyomoOnlineDocs.html#_data_input

mfripp commented 5 years ago

I think we could pretty easily support any file format that Pyomo allows. I'm not sure if .tsv is on that list. But I would actually be more in favor of just standardizing on .csv for both input and output. A few reasons for this:

As an amendment to my first point: I'm working on code to allow users to specify aliases for any input file from the command line (or in scenarios.txt or options.txt), e.g., --alias gen_build_costs.tab=gen_build_costs_low.tab. This would be useful for running different scenarios, e.g., swap between high and low equipment or fuel prices. I haven't added it to the main codebase yet because it doesn't really extend our functionality, just reduces storage requirements (you can already run these other scenarios by creating input directories for each interesting permutation). But this could also be a way to support use of other file formats even in the standard modules, e.g., --alias gen_build_costs.tab=gen_build_costs_low.xlsx. So we could do both -- use .csv files by default, but allow users to specify other formats via an alias, and allow users to load data from any pyomo-supported file format in their own custom modules. We'd also need some command-line argument to specify the output format to use.

josiahjohnston commented 5 years ago

@mfripp I wrote a quick patch to support csv files (in addition to tab) (02aa13d509a08c5d937869e3bec9d1b53d9e4a3d), but didn't change the rest of the code. Not sure if that commit would be better off in the 2.0.1 branch or master. That idea could be extended to support xlsx; it just needs to customize the header parsing code. Supporting other file formats (or direct DB connections) would require more thought for how to allow optional columns.

Good points; a few comments:

If we had reason to stick with tab-separated-value, our best bet might be to write a new pyomo data plugin called tsv_table.py that was almost identical to csv, but with a different separator. Then submit a pull request.

FTR, Pyomo's DataPortal now supports way more data formats than when we first wrote data loading code:

Also worth noting is documentation & official support for skipping DataPortal and directly using Python dictionaries.

josiahjohnston commented 5 years ago

Release 2.0.5 transitions all input & output files to .csv. Well, all outputs except the trivial total_cost.txt that stores a single number and the results.pickle file that stores the solution in binary format. This release should go out in the next 24-48 hours.

There's another option --input-aliases that will allow you to specify alternative names for expected input files. Matthias has that on a pre-release branch and plans to merge it in. That should allow you to use '.tab' input files instead of .csv, or .tsv if a future version of Pyomo supports them.

Any developer who wishes to use other input file formats for their modules may write new modules that use any allowed DataPortal format via standard calls to DataPortal.load on the switch_model DataPortal instance, rather than calling load_aug (our thing wrapper around load). If anyone has a use case for disabling our input method and using their own methods instead, please contact us for tips on how to go about doing that.

@pesap Does this address your issue? Please reply in the next month or two, or we may close this issue as part of housekeeping.

Cheers, -Josiah