polca / unfold

UNpacking For scenariO-based Lca Databases
GNU Affero General Public License v3.0
10 stars 1 forks source link

Compatibility issues for ecoinvent 3.9.1 #7

Closed mkvdhulst closed 1 year ago

mkvdhulst commented 1 year ago

Hi @romainsacchi,

Thank you for creating this very useful package for database sharing! As stated in the README, unfold was tested for ecoinvent 3.6, 3.7 and 3.8, but should work with other databases. I gave it a go for ecoinvent 3.9.1 to see if it would work, but there seems to be compatibility issues with the biosphere3 database, since I get the following KeyError:

KeyError Traceback (most recent call last)

Cell In [3], line 2

1 u = Unfold("C:/Users/hulstmkvd/Desktop/datapackage_2023-04-25.zip") --> 2 u.unfold()

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:1078, in Unfold.unfold(self, scenarios, dependencies, superstructure, name) 1071 self.generate_factors() 1073 if not superstructure: 1074 self.databases_to_export = { 1075 k: v 1076 for k, v in zip( 1077 [s["name"] for s in self.scenarios], -> 1078 self.generate_single_databases(), 1079 ) 1080 } 1081 else: 1082 print("Writing scenario difference file...")

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:814, in Unfold.generate_single_databases(self) 800 """ 801 Generates single databases for each scenario in self.scenarios. 802 (...) 809 - Finally, it uses the 3D numpy array to generate single databases for each scenario by calling the build_single_databases() function. 810 """ 811 m = self.populate_sparse_matrix() 813 matrix = sparse.stack( --> 814 [ 815 sparse.COO( 816 self.write_scaling_factors_inmatrix(copy.deepcopy(m), s["name"]) 817 ) 818 for , s in enumerate(self.scenarios) 819 ], 820 axis=-1, 821 ) 823 return self.build_single_databases( 824 matrix=matrix, databases_to_build=self.scenarios 825 )

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:816, in (.0) 800 """ 801 Generates single databases for each scenario in self.scenarios. 802 (...) 809 - Finally, it uses the 3D numpy array to generate single databases for each scenario by calling the build_single_databases() function. 810 """ 811 m = self.populate_sparse_matrix() 813 matrix = sparse.stack( 814 [ 815 sparse.COO( --> 816 self.write_scaling_factors_inmatrix(copy.deepcopy(m), s["name"]) 817 ) 818 for , s in enumerate(self.scenarios) 819 ], 820 axis=-1, 821 ) 823 return self.build_single_databases( 824 matrix=matrix, databases_to_build=self.scenarios 825 )

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:583, in Unfold.write_scaling_factors_in_matrix(self, matrix, scenario_name) 574 # Look up the index of the supplier activity in the reversed activities index. 575 supplier_id = ( 576 s_name, 577 s_prod, (...) 581 s_type, 582 ) --> 583 supplier_idx = self.reversed_acts_indices[supplier_id] 585 # Multiply the appropriate element of the matrix by the scaling factor for the given scenario. 586 # Use the lambda function defined above to avoid multiplying by zero. 587 matrix[supplier_idx, consumer_idx] = factor[scenarioname] * ( 588 matrix[supplier_idx, consumer_idx] 589 )

KeyError: ('Particulates, < 2.5 um', None, ('air', 'urban air close to ground'), None, 'kilogram', 'biosphere')

If I understand correctly from the code, fix_key is used to change the key for exchanges in the new biosphere3 database to those in the old biosphere3 database using the outdated_flows.yaml (e.g. in this case from 'Particulate Matter, < 2.5 um' to 'Particulates, < 2.5 um'). Could it be that somewhere in the code the keys for exchanges in one of the databases are not transformed by fix_key, resulting in mismatches (e.g. keys for the exchanges in the datapackage are fixed, but keys for the ecoinvent 3.9.1. and biosphere3 databases in the project are not)?

What I did: I started with trying to install unfold via conda, but "solving environment" kept failing. I even tried Anaconda Navigator, where I was able to locate the package, but when installing it remained stuck at "Solving package specifications"_. When installing unfold using pip, the bw2io package was downgraded to 0.8.7. Thinking bw2io might be the culprit, I checked what would happen if I upgrade back to bw2io 0.8.8, but this resulted in an issue with the CSVImporter used in extract_additional_inventories. For creating the used datapackage, I ran premise 1.5.0-beta3 for two IMAGE SSP2-RCP26 scenarios that were applied to ecoinvent 3.9.1 and then folded the databases into a datapackage, following the steps in the example notebook. I then attempted to unfold this datapackage into an existing project, which had been setup with bw2io 0.8.8 so that it contains the right version of the biosphere3 database for use with ecoinvent 3.9(.1). When indicating dependencies, I matched the scenarios of the datapackage to this biosphere3 database and (an original, unchanged version of) the ecoinvent 3.9.1 cut-off database._

romainsacchi commented 1 year ago

Hi, Can you confirm you are using v.1.0.7?

On 26 Apr 2023, at 17:28, mkvdhulst @.***> wrote:

 Hi @romainsacchi,

Thank you for creating this very useful package for database sharing! As stated in the README, unfold was tested for ecoinvent 3.6, 3.7 and 3.8, but should work with other databases. I gave it a go for ecoinvent 3.9.1 to see if it would work, but there seems to be compatibility issues with the biosphere3 database, since I get the following KeyError:

KeyError Traceback (most recent call last)

Cell In [3], line 2

1 u = Unfold("C:/Users/hulstmkvd/Desktop/datapackage_2023-04-25.zip") --> 2 u.unfold()

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:1078, in Unfold.unfold(self, scenarios, dependencies, superstructure, name) 1071 self.generate_factors() 1073 if not superstructure: 1074 self.databases_to_export = { 1075 k: v 1076 for k, v in zip( 1077 [s["name"] for s in self.scenarios], -> 1078 self.generate_single_databases(), 1079 ) 1080 } 1081 else: 1082 print("Writing scenario difference file...")

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:814, in Unfold.generate_single_databases(self) 800 """ 801 Generates single databases for each scenario in self.scenarios. 802 (...) 809 - Finally, it uses the 3D numpy array to generate single databases for each scenario by calling the build_single_databases() function. 810 """ 811 m = self.populate_sparse_matrix() 813 matrix = sparse.stack( --> 814 [ 815 sparse.COO( 816 self.write_scaling_factors_inmatrix(copy.deepcopy(m), s["name"]) 817 ) 818 for , s in enumerate(self.scenarios) 819 ], 820 axis=-1, 821 ) 823 return self.build_single_databases( 824 matrix=matrix, databases_to_build=self.scenarios 825 )

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:816, in (.0) 800 """ 801 Generates single databases for each scenario in self.scenarios. 802 (...) 809 - Finally, it uses the 3D numpy array to generate single databases for each scenario by calling the build_single_databases() function. 810 """ 811 m = self.populate_sparse_matrix() 813 matrix = sparse.stack( 814 [ 815 sparse.COO( --> 816 self.write_scaling_factors_inmatrix(copy.deepcopy(m), s["name"]) 817 ) 818 for , s in enumerate(self.scenarios) 819 ], 820 axis=-1, 821 ) 823 return self.build_single_databases( 824 matrix=matrix, databases_to_build=self.scenarios 825 )

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:583, in Unfold.write_scaling_factors_in_matrix(self, matrix, scenario_name) 574 # Look up the index of the supplier activity in the reversed activities index. 575 supplier_id = ( 576 s_name, 577 s_prod, (...) 581 s_type, 582 ) --> 583 supplier_idx = self.reversed_acts_indices[supplier_id] 585 # Multiply the appropriate element of the matrix by the scaling factor for the given scenario. 586 # Use the lambda function defined above to avoid multiplying by zero. 587 matrix[supplier_idx, consumer_idx] = factor[scenarioname] * ( 588 matrix[supplier_idx, consumer_idx] 589 )

KeyError: ('Particulates, < 2.5 um', None, ('air', 'urban air close to ground'), None, 'kilogram', 'biosphere')

If I understand correctly from the code, fix_key is used to change the key for exchanges in the new biosphere3 database to those in the old biosphere3 database using the outdated_flows.yaml (e.g. in this case from 'Particulate Matter, < 2.5 um' to 'Particulates, < 2.5 um'). Could it be that somewhere in the code the keys for exchanges in one of the databases are not transformed by fix_key, resulting in mismatches (e.g. keys for the exchanges in the datapackage are fixed, but keys for the ecoinvent 3.9.1. and biosphere3 databases in the project are not)?

What I did: I started with trying to install unfold via conda, but "solving environment" kept failing. I even tried Anaconda Navigator, where I was able to locate the package, but when installing it remained stuck at "Solving package specifications". When installing unfold using pip, the bw2io package was downgraded to 0.8.7. Thinking bw2io might be the culprit, I checked what would happen if I upgrade back to bw2io 0.8.8, but this resulted in an issue with the CSVImporter used in extract_additional_inventories. For creating the used datapackage, I ran premise 1.5.0-beta3 for two IMAGE SSP2-RCP26 scenarios that were applied to ecoinvent 3.9.1 and then folded the databases into a datapackage, following the steps in the example notebook. I then attempted to unfold this datapackage into an existing project, which had been setup with bw2io 0.8.8 so that it contains the right version of the biosphere3 database for use with ecoinvent 3.9(.1). When indicating dependencies, I matched the scenarios of the datapackage to this biosphere3 database and (an original, unchanged version of) the ecoinvent 3.9.1 cut-off database.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

mkvdhulst commented 1 year ago

Yes, I am using unfold 1.0.7

romainsacchi commented 1 year ago

Could you maybe send me your data package? A share link in an email? My address is r_s at me.com.

mkvdhulst commented 1 year ago

It should be on your way. I sent it via WeTransfer.

romainsacchi commented 1 year ago

Thanks. I'll look into it -- may be in a few days though.

romainsacchi commented 1 year ago

Hi, I think updating to v.1.0.8 may fix your issue. However, I still had an issue using your datapackage -- but a different one, that I could fix simply by recreating the datapackage. Let me know how that goes.

mkvdhulst commented 1 year ago

The fix is not yet working for me. Perhaps I'm using unfold the wrong way.

I created a new datapackage (IMAGE SSP2 baseline/RCP26/RCP19 for 2020/2025/2030/2035/2040/2045/2050/2055 for all scenarios, excluding cars, two-wheeler, and busses) with premise 1.5.0-beta3 in project x and then tried to unfold it in project y, using unfold 1.0.8. Both projects have the biosphere database and the original ecoinvent 3.9.1 cut-off database. When adding exchange data to activities, no data seems to be transfered and the process is finished in 0 sec. It then moves on to extracting the additional inventories, but there it can't find any activity for the flows that were added by premise (e.g. 'electricity production, hydro, reservoir, non-alpine region', 'electricity, high voltage', 'CA-QC', 'kilowatt hour', 'transport, freight, lorry, unspecified, long haul', 'transport, freight, lorry', 'CAN', None, 'ton kilometer', 'technosphere'). It then continues to print all flows that it cannot find, in batches, with in-between the messsage:

IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit.

Current values: NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec) NotebookApp.rate_limit_window=3.0 (secs)

Once it has processed all additional inventories, the following error is printed:

Generating database for scenario image - SSP2-Base - 2020...

KeyError Traceback (most recent call last) Cell In[4], line 2 1 u = Unfold("C:/Users/hulstmkvd/Anaconda3/envs/test/export/datapackage/datapackage_2023-05-09.zip") ----> 2 u.unfold()

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:1109, in Unfold.unfold(self, scenarios, dependencies, superstructure, name) 1102 self.generate_factors() 1104 if not superstructure: 1105 self.databases_to_export = { 1106 k: v 1107 for k, v in zip( 1108 [s["name"] for s in self.scenarios], -> 1109 self.generate_single_databases(), 1110 ) 1111 } 1112 else: 1113 print("Writing scenario difference file...")

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:829, in Unfold.generate_single_databases(self) 817 m = self.populate_sparse_matrix() 819 matrix = sparse.stack( 820 [ 821 sparse.COO( (...) 826 axis=-1, 827 ) --> 829 return self.build_single_databases( 830 matrix=matrix, databases_to_build=self.scenarios 831 )

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:738, in Unfold.build_single_databases(self, matrix, databases_to_build) 734 act.update(self.dict_meta[self.acts_indices[k]]) 736 # For each consumer index associated with the current producer index, create an exchange dictionary 737 # and add it to the activity's exchanges list. --> 738 act["exchanges"].extend( 739 self.get_exchange( 740 ind=j, amount=matrix[j, k, ix], scenario_name=i["name"] 741 ) 742 for j in v 743 ) 745 new_db.append(act) 747 # remove datasets that are not in the current scenario

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:739, in (.0) 734 act.update(self.dict_meta[self.acts_indices[k]]) 736 # For each consumer index associated with the current producer index, create an exchange dictionary 737 # and add it to the activity's exchanges list. 738 act["exchanges"].extend( --> 739 self.get_exchange( 740 ind=j, amount=matrix[j, k, ix], scenario_name=i["name"] 741 ) 742 for j in v 743 ) 745 new_db.append(act) 747 # remove datasets that are not in the current scenario

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:472, in Unfold.get_exchange(self, ind, amount, scenario_name) 461 name, ref, cat, loc, unit, flow_type = self.actsindices[ind] 462 = lambda x: x if x != 0 else 1.0 464 return { 465 "name": name, 466 "product": ref, 467 "unit": unit, 468 "location": loc, 469 "categories": cat, 470 "type": flow_type, 471 "amount": amount if flowtype != "production" else (amount), --> 472 "input": self.fix_key((name, ref, loc, cat)) 473 if flow_type == "biosphere" 474 else ( 475 scenario_name, 476 self.fetch_exchange_code(name, ref, loc), 477 ), 478 }

File ~\Anaconda3\envs\test\lib\site-packages\unfold\unfold.py:443, in Unfold.fix_key(self, key) 441 return self.dependency_mapping[key] 442 else: --> 443 return self.dependency_mapping[self.find_correct_id(key)]

KeyError: None

I've uploaded the relevant files to file.io where they are available for download for the next month.

romainsacchi commented 1 year ago

Hi @mkvdhulst , sorry about that, I'll look into it.

romainsacchi commented 1 year ago

@mkvdhulst , I have a bad and a good news. The goods news is that I think the issue is fixed. That brings us to the bad news: the issue was due to premise not exporting the new activities (i.e., not present in the original ecoinvent DB) to the dataèackage. It means that you do not need to update unfold but you unfortunately have to re-generate a datapackage using premise 1.5.0-b6 which should be available in a few minutes. Let me know.