Dimension mismatch between technosphere and biosphere when exporting matrices

Michael-ljn commented 2 months ago

Hi @romainsacchi,

When exporting matrices with the Remind model SSP2-pkbudg500, there is a dimension mismatch. There are more activities in the A matrix than the B matrix. I tried to look for the source of the issue, but I can hardly trace it back. I just updated to the last version, 2.1.3.

A: (28475, 28475) B: (4709, 28462)

romainsacchi commented 2 months ago

Hi @Michael-ljn, can you paste here the script used? I'll try to re-run it on my end.

Michael-ljn commented 2 months ago

Hi @romainsacchi,

I have been looking into it but don't have time to propose a fix. The issue comes from the update() method and in particular with one of these updates:

ndb.update("trucks")
ndb.update("two_wheelers")
ndb.update("cars")
ndb.update("buses")

ndb.write_db_to_matrices("Technosphere2") The database was defined as follows

ndb= NewDatabase(
        scenarios = [
            {"model":"REMIND", "pathway":"SSP2-PkBudg500", "year":2025,},
        ],        
        source_db="ecoinvent-3.9.1-cutoff",
        source_version="3.9.1",
        key='xxxxxxxxx',
        biosphere_name="ecoinvent-3.9.1-biosphere",
        keep_source_db_uncertainty=True,
        keep_imports_uncertainty=True)

romainsacchi commented 2 months ago

When running:

ndb= NewDatabase(
    scenarios = [
    {"model":"REMIND", "pathway":"SSP2-PkBudg500", "year":2025,},
    ],
    source_db="ecoinvent-3.9.1-cutoff",
    source_version="3.9.1",
    key='tUePmX_S5B8ieZkkM7WUU2CnO8SmShwmAeWK9x2rTFo=',
    biosphere_name="ecoinvent-3.9.1-biosphere",
    keep_source_db_uncertainty=True,
    keep_imports_uncertainty=True
)
ndb.update()
ndb.write_db_to_matrices()

(meaning, all sectors), I get correct shapes. Not sure why I don't get exactly the same shape as you though.

romainsacchi commented 2 months ago

Can you provide an exact case/script that leads to the error?

Michael-ljn commented 1 month ago

Hi @romainsacchi,

Apologies, I mixed up the scenarios, the dimensions because I was trying to locate where the issue, is coming from. The dimensions I provided are for one of the SSP-pkbudg500, I can't find the exact one. But the issue happens in all scenarios and models, actually.

First I start with a fresh environment, installing brightway (mac ARM install process) and all dependencies and then pip install premise ==2.1.3. Running the following:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import bw2data as bd
import bw2io as bi
import ecoinvent_interface
import pickle
from premise import *
from datapackage import Package

name = "premise"
ei_name = "ecoinvent-3.9.1-cutoff"
bd.projects.set_current(name)
if "biosphere3" in bd.databases:
    print(f"biosphere3 has already been imported.")
elif ei_name in bd.databases:
    print(f"{ei_name} has already been imported.")      
else:
    bi.import_ecoinvent_release(version="3.9.1",system_model="cutoff",username="xxxxxxxx",password="xxxx")

clear_cache()
ndb = NewDatabase(
        scenarios = [
            {"model":"REMIND", "pathway":"SSP1-PkBudg500", "year":2025,},
        ],        
        source_db="ecoinvent-3.9.1-cutoff",
        source_version="3.9.1",
        key='xxxxxx,
        biosphere_name="ecoinvent-3.9.1-biosphere",
        keep_source_db_uncertainty=True,
        keep_imports_uncertainty=True
    )
ndb.update()
ndb.write_db_to_matrices("test_update_all")

The opening the produced csv files for A and B can clearly show the mismatch.

Screenshot 2024-09-10 at 07 21 47

now running the following

clear_cache()
ndb = NewDatabase(
        scenarios = [
            {"model":"REMIND", "pathway":"SSP1-PkBudg500", "year":2025,},
        ],        
        source_db="ecoinvent-3.9.1-cutoff",
        source_version="3.9.1",
        key='xxxxxxxxx',
        biosphere_name="ecoinvent-3.9.1-biosphere",
        keep_source_db_uncertainty=True,
        keep_imports_uncertainty=True
    )
ndb.update("electricity")
ndb.update("fuels")
ndb.update("heat")
ndb.update("emissions")
ndb.update("external")
ndb.update("biomass")
ndb.update("dac")
ndb.update("cement")
ndb.update("steel")
ndb.write_db_to_matrices("test_update_everything_no_vehicules")

Everything is good.

Screenshot 2024-09-10 at 07 30 34

Lastly, when running:

clear_cache()
ndb = NewDatabase(
        scenarios = [
            {"model":"REMIND", "pathway":"SSP1-PkBudg500", "year":2025,},
        ],        
        source_db="ecoinvent-3.9.1-cutoff",
        source_version="3.9.1",
        key='xxxxxxxxx',
        biosphere_name="ecoinvent-3.9.1-biosphere",
        keep_source_db_uncertainty=True,
        keep_imports_uncertainty=True
    )
ndb.update("trucks")
ndb.update("two_wheelers")
ndb.update("cars")
ndb.update("buses")
ndb.write_db_to_matrices("test_update_vehicules")

The mismatch happens again. It's a process of elimination, a bit long, but I guess you might know which one of the updates might be causing that.

Screenshot 2024-09-10 at 07 36 33

romainsacchi commented 1 month ago

I still have correct shapes when running only:

ndb.update("trucks")
ndb.update("two_wheelers")
ndb.update("cars")
ndb.update("buses")

romainsacchi commented 1 month ago

Maybe let's look at the script you use to load the matrices. What shape do you get when running this?

from scipy import sparse
#from pypardiso import spsolve <-- use pypardiso if you use an Intel chip, it's much faster!
from scipy.sparse.linalg import spsolve
from pathlib import Path
from csv import reader
import numpy as np

fp="/Users/romain/Documents/export/remind/SSP2-PkBudg500/2025"

# the directory to the set of files produced by premise
DIR = Path(fp) 

# creates dict of activities <--> indices in A matrix
A_inds = dict()
with open(DIR / "A_matrix_index.csv", 'r') as read_obj:
    csv_reader = reader(read_obj, delimiter=";")
    for row in csv_reader:
        A_inds[(row[0], row[1], row[2], row[3])] = row[4]

A_inds_rev = {int(v):k for k, v in A_inds.items()}

# creates dict of bio flow <--> indices in B matrix
B_inds = dict()
with open(DIR / "B_matrix_index.csv", 'r') as read_obj:
    csv_reader = reader(read_obj, delimiter=";")
    for row in csv_reader:
        B_inds[(row[0], row[1], row[2], row[3])] = row[4]

B_inds_rev = {int(v):k for k, v in B_inds.items()}

# create a sparse A matrix
A_coords = np.genfromtxt(DIR / "A_matrix.csv", delimiter=";", skip_header=1)
I = A_coords[:, 0].astype(int)
J = A_coords[:, 1].astype(int)
A = sparse.csr_matrix((A_coords[:,2], (J, I)))

# create a sparse B matrix
B_coords = np.genfromtxt(DIR / "B_matrix.csv", delimiter=";", skip_header=1)
I = B_coords[:, 0].astype(int)
J = B_coords[:, 1].astype(int)
B = sparse.csr_matrix((B_coords[:,2] * -1, (I, J)), shape=(A.shape[0], len(B_inds)))

print(A.shape)
print(B.shape)

romainsacchi commented 1 month ago

@Michael-ljn have you tried the above example?

Michael-ljn commented 1 month ago

Hi @romainsacchi,

I tried on my Macbook and I get the same mismatch as in the screenshots above. I have tried on a different windows laptop and I got the right dimensions. I guess it is related to my macbook. I haven't had the chance to look further into but in will in 2 weeks from now.

romainsacchi commented 1 week ago

Re-open if needed.

polca / premise

Dimension mismatch between technosphere and biosphere when exporting matrices #178