polca / premise

Coupling Integrated Assessment Models output with Life Cycle Assessment.
BSD 3-Clause "New" or "Revised" License
110 stars 48 forks source link

Database (ecoinvent) import problem- InvalidExchange error #87

Closed Shima-Fa closed 1 year ago

Shima-Fa commented 1 year ago

Hi, with the latest version of Premise (1.3.2) I have tried to import the ecoinvent cutoff database as defined as below:

if 'ecoinvent 3.8_cutoff' in bw.databases:
    print("Database has already been imported.")
else:
    fpei38cut = r"S:\pathway\ecoinvent 3.8_cutoff_ecoSpold02\datasets"   
    ei38cut = bw.SingleOutputEcospold2Importer(fpei38cut, 'ecoinvent 3.8_cutoff') 
    ei38cut.apply_strategies()
    ei38cut.statistics()
    ei38cut.write_database()

however I have got the InvalidExchange error for some reason:

Extracting XML data from 19128 datasets
Extracted 19128 datasets in 270.80 seconds
Applying strategy: normalize_units
Applying strategy: update_ecoinvent_locations
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: remove_unnamed_parameters
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: drop_unspecified_subcategories
Applying strategy: fix_ecoinvent_flows_pre35
Applying strategy: drop_temporary_outdated_biosphere_flows
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
1 exchanges couldn't be linked and were deleted. See the logfile for details:
    D:\Users\AppData\Local\pylca\Brightway3\Logs\new-premise.e98a542a870697bc71d177737f8cc143\Ecospold2-import-error.qyaxTs.log
Applying strategy: delete_ghost_exchanges
Applying strategy: remove_uncertainty_from_negative_loss_exchanges
Applying strategy: fix_unreasonably_high_lognormal_uncertainties
Applying strategy: set_lognormal_loc_value
Applying strategy: convert_activity_parameters_to_list
Applying strategy: add_cpc_classification_from_single_reference_product
Applying strategy: delete_none_synonyms
Applied 21 strategies in 19.32 seconds
19128 datasets
621717 exchanges
1770 unlinked exchanges
  Type biosphere: 17 unique unlinked exchanges
Writing activities to SQLite3 database:
-----------------------------------------------------------------------------------------------
InvalidExchange                           Traceback (most recent call last)
Cell In [11], line 14
     12 ei38cut.apply_strategies()
     13 ei38cut.statistics()
---> 14 ei38cut.write_database()

File C:\ProgramData\Anaconda3\envs\sacchi\lib\site-packages\bw2io\importers\base_lci.py:269, in LCIImporter.write_database(self, data, delete_existing, backend, activate_parameters, **kwargs)
    266 self.write_database_parameters(activate_parameters, delete_existing)
    268 existing.update(data)
--> 269 db.write(existing)
    271 if activate_parameters:
    272     self._write_activity_parameters(activity_parameters)

File C:\ProgramData\Anaconda3\envs\sacchi\lib\site-packages\bw2data\project.py:358, in writable_project(wrapped, instance, args, kwargs)
    356 if projects.read_only:
    357     raise ReadOnlyProject(READ_ONLY_PROJECT)
--> 358 return wrapped(*args, **kwargs)

File C:\ProgramData\Anaconda3\envs\sacchi\lib\site-packages\bw2data\backends\peewee\database.py:260, in SQLiteBackend.write(self, data, process)
    258 if data:
    259     try:
--> 260         self._efficient_write_many_data(data)
    261     except:
    262         # Purge all data from database, then reraise
    263         self.delete(warn=False)

File C:\ProgramData\Anaconda3\envs\sacchi\lib\site-packages\bw2data\backends\peewee\database.py:204, in SQLiteBackend._efficient_write_many_data(self, data, indices)
    197     self.pbar = pyprind.ProgBar(
    198         len(data),
    199         title="Writing activities to SQLite3 database:",
    200         monitor=True
    201     )
    203 for index, (key, ds) in enumerate(data.items()):
--> 204     exchanges, activities = self._efficient_write_dataset(
    205         index, key, ds, exchanges, activities
    206     )
    208 if not getattr(config, "is_test", None):
    209     print(self.pbar)

File C:\ProgramData\Anaconda3\envs\sacchi\lib\site-packages\bw2data\backends\peewee\database.py:156, in SQLiteBackend._efficient_write_dataset(self, index, key, ds, exchanges, activities)
    154 for exchange in ds.get('exchanges', []):
    155     if 'input' not in exchange or 'amount' not in exchange:
--> 156         raise InvalidExchange
    157     if 'type' not in exchange:
    158         raise UntypedExchange

InvalidExchange: 

accordingly I am not able to create the premise new datasets, since the base database could not be written. I have had NO problem with above approach of importing ecoinvent datsets in previous versions of premise.

@romainsacchi @cmutel @tngTUDOR I would be very thankful if you could give me any solution to that.

romainsacchi commented 1 year ago

HI, can you maybe try to update your biosphere? Update to the latest version of bw2io and then: bw2io.create_default_biosphere3() I think this is the one.

cmutel commented 1 year ago

@Shima-Fa We just updated bw2io to be compatible with ecoinvent release 3.9. However, 3.9 also deleted some biosphere flows, and I guess that these flows are what is missing. I will need to confirm this; in the mean-time, could you please do the following:

Shima-Fa commented 1 year ago

@cmutel @romainsacchi

These are the packages you mentioned, that I am using with latest version of premise:

# packages in environment 
# Name                    Version                Build       Channel

brightway2                2.4.2                    pypi_0     pypi
bw2data                   3.6.5                    pypi_0     pypi
bw2io                     0.8.8                    pypi_0     pypi

I have tried both suggested solutions but still I am getting the same error. I have to mention that at the moment I have only access to ecoinvent 3.8

cmutel commented 1 year ago

Yes, bw2io 0.8.8 is only ecoinvent 3.9 compatible. But the command to create a new environment should work? In any case, I will fix this in the next 2 days.

cmutel commented 1 year ago

This isn't a polca problem, so I am closing it here, and tracking this problem at the linked bw2io issue.

Shima-Fa commented 1 year ago

@cmutel No unfortunately it did not work with the new environment creation, but thanks for taking care of it

tngTUDOR commented 1 year ago

I'm trying to reproduce the issue and I noticed that the number of "datasets" for ecoinvent 3.8 cutoff are different. I see 19565, but @Shima-Fa shows 19128.

tngTUDOR commented 1 year ago

This isn't a polca problem, so I am closing it here, and tracking this problem at the linked bw2io issue.

The issue 🦠 in bw2io is 143

tngTUDOR commented 1 year ago

@Shima-Fa, you can try using an environment with a slightly outdated bw2io to make things work for you while wating for a fix if #143 of bw2io.

(I used mamba instead of conda, but they should be equivalent)

mamba create -n bw2-ei38compat-ok -c conda-forge -c cmutel "brightway2=2.4.2=py_5" "bw2io<0.8.8"

⬆️ creates an environment with dependencies that allow to import ecoinvent 3.8 cutoff correctly. The dependencies I get are:

➜ mamba list | egrep "bw2"      
# packages in environment at /home/pachacuti/mambaforge/envs/bw2-ei38compat-ok:
bw2analyzer               0.10                       py_1    cmutel
bw2calc                   1.8.1                      py_2    cmutel
bw2data                   3.6.5                      py_0    cmutel
bw2io                     0.8.6                      py_1    cmutel
bw2parameters             0.7                pyhd8ed1ab_0    conda-forge

This environment allowed me to import ecoinvent 3.8 cutoff correctly (see below). To use premise afterwards, I tested by installing the latest from the anaconda channel from romain (not the stable version from pypi)

mamba install -c conda-forge -c cmutel -c romainsacchi premise

which installed version 2022.10.16. I didn't test with pypi install to avoid mixing conda + pypi packages, but this should give you an environment where you can use the database and up to date (although not latest brightway2 + premise).

In [1]: from brightway2 import *

In [2]: projects
Out[2]: 
Brightway2 projects manager with 2 objects:
        default
        ei38-cutoff
Use `projects.report()` to get a report on all projects.

In [3]: import bw2io

In [4]: bw2io.__version__
Out[4]: (0, 8, 6)

In [6]: import brightway2

In [7]: brightway2.__version__
Out[7]: (2, 4, 1)

In [8]: projects.current
Out[8]: 'default'

In [9]: projects.set_current('ei38-cutoff-io-0-8-6')

In [10]: bw2setup()
Creating default biosphere

Applying strategy: normalize_units
Applying strategy: drop_unspecified_subcategories
Applying strategy: ensure_categories_are_tuples
Applied 3 strategies in 0.00 seconds
Writing activities to SQLite3 database:
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Title: Writing activities to SQLite3 database:
  Started: 10/20/2022 11:53:55
  Finished: 10/20/2022 11:53:55
  Total time elapsed: 00:00:00
  CPU %: 73.00
  Memory %: 0.48
Created database: biosphere3
Creating default LCIA methods

Applying strategy: normalize_units
Applying strategy: set_biosphere_type
Applying strategy: fix_ecoinvent_38_lcia_implementation
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Applied 5 strategies in 0.75 seconds
Wrote 975 LCIA methods with 254388 characterization factors
Creating core data migrations

In [11]: len(Database('biosphere3'))
Out[11]: 4427

In [12]: importer = SingleOutputEcospold2Importer('datasets/', 'ei38-cutoff')
Extracting XML data from 19565 datasets
Extracted 19565 datasets in 112.44 seconds

In [13]: importer.apply_strategies()
Applying strategy: normalize_units
Applying strategy: update_ecoinvent_locations
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: remove_unnamed_parameters
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: drop_unspecified_subcategories
Applying strategy: fix_ecoinvent_flows_pre35
Applying strategy: drop_temporary_outdated_biosphere_flows
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
Applying strategy: delete_ghost_exchanges
Applying strategy: remove_uncertainty_from_negative_loss_exchanges
Applying strategy: fix_unreasonably_high_lognormal_uncertainties
Applying strategy: set_lognormal_loc_value
Applying strategy: convert_activity_parameters_to_list
Applying strategy: add_cpc_classification_from_single_reference_product
Applying strategy: delete_none_synonyms
Applied 21 strategies in 3.88 seconds

In [14]: importer.statistics()
19565 datasets
629959 exchanges
0 unlinked exchanges

Out[14]: (19565, 629959, 0)

In [15]: importer.write_database()
Writing activities to SQLite3 database:
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:32
Title: Writing activities to SQLite3 database:
  Started: 10/20/2022 11:57:48
  Finished: 10/20/2022 11:58:20
  Total time elapsed: 00:00:32
  CPU %: 69.60
  Memory %: 7.43
Created database: ei38-cutoff
Out[15]: Brightway2 SQLiteBackend: ei38-cutoff

premise worked:

In [19]: clear_cache()
Cache folder cleared!

In [20]: ndb = NewDatabase(
    ...:             scenarios=[
    ...:                 {"model":"image", "pathway":"SSP2-RCP19", "year":2050},
    ...:                 {"model":"remind", "pathway":"SSP2-PkBudg500", "year":2050},
    ...:             ],
    ...:             source_db="ei38-cutoff", # <-- name of the database in the BW2 project. 
    ...: Must be a string.
    ...:             source_version="3.8", # <-- version of ecoinvent. Can be "3.5", "3.6", "
    ...: 3.7" or "3.8". Must be a string.
    ...:             key='intihuatanastone' # <-- decryption key
    ...:             # to be requested from the library maintainers if you want ot use defaul
    ...: t scenarios included in `premise`
    ...:     )
premise v.(1, 3, 2)
+------------------------------------------------------------------+
| Warning                                                          |
+------------------------------------------------------------------+
| Because some of the scenarios can yield LCI databases            |
| containing net negative emission technologies (NET),             |
| it is advised to account for biogenic CO2 flows when calculating |
| Global Warming potential indicators.                             |
| `premise_gwp` provides characterization factors for such flows.  |
| It also provides factors for hydrogen emissions to air.          |
|                                                                  |
| Within your bw2 project:                                         |
| from premise_gwp import add_premise_gwp                          |
| add_premise_gwp()                                                |
+------------------------------------------------------------------+
+--------------------------------+----------------------------------+
| Utils functions                | Description                      |
+--------------------------------+----------------------------------+
| clear_cache()                  | Clears the cache folder. Useful  |
|                                | when updating `premise`or        |
|                                | encountering issues with         |
|                                | inventories.                     |
+--------------------------------+----------------------------------+
| get_regions_definition(model)  | Retrieves the list of countries  |
|                                | for each region of the model.    |
+--------------------------------+----------------------------------+
| ndb.NewDatabase(...)           | Generates a summary of the most  |
| ndb.generate_scenario_report() | important scenarios' variables.  |
+--------------------------------+----------------------------------+
Keep uncertainty data?
NewDatabase(..., keep_uncertainty_data=True)

Hide these messages?
NewDatabase(..., quiet=True)

//////////////////// EXTRACTING SOURCE DATABASE ////////////////////
Cannot find cached database. Will create one now for next time...
Shima-Fa commented 1 year ago

@tngTUDOR Thank you very much for this offered solution, I tried it and it worked just smoothly.

I am looking forward to the next update of the premise to be compatible with ecoinvent 3.9 too.