hybridlca / pyspa

A python package for conducting structural path analysis on square technological matrices of process or input-output data, using environmental, social and/or financial satellites
GNU General Public License v3.0
25 stars 4 forks source link

pyspa banner

pyspa is an object-oriented python package which enables you to conduct a parametric structural path analysis on square A matrices (process or input-output) for any number of environmental, social or economic satellites/flows and for any number of stages upstream in your supply chain (as long you have enough RAM). The package produces a SupplyChain object which includes Pathway and Node objects (among others). Results can be exported to the csv format with a single line of code.

The concept behind pyspa was driven by the lack of open source code to conduct structural path analysis in a robust and object-oriented manner.

Getting Started

Prerequisites

You will need python to run this package as well as the following python packages:

  1. numpy
  2. pandas
  3. scipy - To support the use of sparse A matrices

Installing

Download and install the package from pip

pip install pyspa

Testing pyspa

Identify the template files in the installed directory, or download them directly from the github repository. The template files include:

  1. A_matrix_template.csv
  2. Infosheet_template.csv
  3. Thresholds_template.csv
  4. Thresholds_template_perc.csv

Once you have located these files, you need to run a single function that will read the data, conduct the structural path analysis and return a SupplyChain object, as per the following code.

sc = pyspa.get_spa(target_ID = 70, max_stage = 10, a_matrix ='A_matrix_template.csv', infosheet='Infosheet_template.csv', thresholds='Thresholds_template.csv')

This will return your SupplyChain object which has numerous methods. Read the documentation for more information.

Note that from pyspa 2.0 forward, this will also, by default, breakdown the remainder of the supply chain (not covered by your spa) into remainder pathways. You can also provide the thresholds as percentages of the total value, which is more convenient. To do so, simply use the call:

sc = pyspa.get_spa(target_ID = 70, max_stage = 10, a_matrix ='A_matrix_template.csv', infosheet='Infosheet_template.csv', thresholds='Thresholds_template_perc.csv', thresholds_as_percentages=True)

To export the structural path analysis to a csv file, use the built-in method.

sc.export_to_csv('spa_results.csv')

To save your SupplyChain object and avoid having to recalculate everything (this uses pickle):

sc.save('supply_chain.sc')

To load a previously saved SupplyChain object:

loaded_sc = pyspa.load_instance_from_file('supply_chain.sc', pyspa.SupplyChain)

We have developped the required python methods on each object so that you can compare them. Thus,

sc == loaded_sc

or

sc.pathways_list[-1] == loaded_sc.pathways_list[-1]

or

sc.root_node == loaded_sc.root_node

will return True.

The detailed documentation is available here

Input files

Description

The package requires three csv files to be able to conduct a structural path analysis:

  1. A square technological matrix, aka an A matrix
  2. An infosheet listing all sectors or processes, along with the direct and total intensities/multipliers/requirements for any number of environmental/economic/social satellites, and their metadata
  3. The cut-off thresholds used to trim the supply chain branches for each satellite. These can be provided either as absolute values or as percentages of the total intensity of the target sector/process.

These csv files must be formatted in a certain way for the code to work. The formatting requirements are described below.

Formatting

Square technological matrix (A matrix)

The A matrix should be provided in a single csv file, regardless of its size (we have tried the code on 15k×15k matrix so far, and it works fine). It must be formatted as follows:

Preview of A matrix csv layout ↓

1 ... n
<A matrix: input from 1 into 1> <A matrix: input from 1 into ...> <A matrix: input from 1 into n>
<A matrix: input from ... into 1> <A matrix: input from ... into ...> <A matrix: input from ... into n>
<A matrix: input from n into 1> <A matrix: input from n into ...> <A matrix: input from n into n>

Infosheet

The infosheet must contain mandatory columns and at least one environmental/social/economic satellite/flow. It must be formatted as follows (all headers are case sensitive):

You can add as many satellites as you need to the infosheet. The code will detect them automatically, as long as their headers are formatted as above. You can also add any other metadata column for your sectors/processes, and then access them through manual coding using the predefined method on your Node objects: get_node_attribute. See the detailed documentation for more details.

Thresholds

The thresholds csv is by far the simplest csv file to provide. It contains only two columns and must be formatted as below:

CSV output file

The csv output file contains some metadata on the structural path analysis itself and then lists, for each satellite/flow, the pathways extracted, by order of significance in terms of the direct intensity/multiplier/requirement of the last node in that pathway. The columns for these listing are:

The direct intensity/multiplier/requirement of the selected sector/process is referred to as DIRECT (Stage 0). Stage 1 refers to the first stage upstream in the supply chain, Stage 2 the following stage, all the way to Stage m as selected at the start. We recommend using around 10 stages upstream for process data, and 8 stages upstream for input-output data, based on our experience. But these values might differ. Remainder pathways are appended at the end of the spa and broken down across the supply chain, identifying clearly where the thresholds were used to cut-off pathways that did not meet the threshold criteria.

Note: The results for each satellite/flow are listed on the same csv sheet, in the order the appear in the infosheet. You will need to scroll down to identify where each new satellite/flow results starts, which is indicated by a header and an empty row. For those using Windows, you can click on any pathway for any given satellite/flow and press: "Ctrl + Shift + ↓". This will take you to the last pathway for this satellite/flow.

Other ways to call pyspa.get_spa()

You can also call pyspa.get_spa using objects in the RAM instead of csv files. That is, a numpy array or scipy sparse array for the A matrix, and dictionaries for the infosheet and thresholds. You can also mix and match between objects in the RAM and csv files, for addtitional flexibility.

Built with:

Authors and contributors

Authors

License

This project is shared under a GNU General Public License v3.0. See the LICENSE file for more information.

Acknowledgments

This project was originally funded by the Australian Research Council Discovery Project DP150100962 at the University of Melbourne, Australia. As such, we are endebted to Australian taxpayers for making this work possible and to the University of Melbourne for providing the facilities and intellectual space to conduct this research. The code for the base method for conducting the structural path analysis is inspired from the code of late A/Prof Graham Treloar at the University of Melbourne, who pioneered a Visual Basic Script in his PhD thesis to conduct a structural path analysis in 1997.