n3ssuno / ReKS

R Package "Regional Knowledge Space" that computes some measure about the knowledge space of a economic region
GNU General Public License v3.0
11 stars 1 forks source link

Are the rankings of the two technical complexity indices derived using different packages inconsistent? #2

Closed lishuiwkp closed 11 months ago

lishuiwkp commented 11 months ago

Dear Bottai, First of all, thank you very much for your work, but I encountered some problems during the operation of the function and would like to ask you for advice. I used the Reks and EconGeo packages to calculate the technical complexity index of the region based on the prototype of Hidalgo and Hausmann (2009). However, I found that the rankings calculated by the two functions were inconsistent. Also, are your scores smaller and more complex? I don’t know if it was a problem with my operation or a code principle. So I would like to ask for your help. Thank you for your help. My English is a bit poor and I may not be able to express myself clearly. Here is the code used: `geo <- paste0("R", 10:50) tech <- paste0("T", 10:90) dat <- expand.grid(geo, tech) colnames(dat) <- c("geo", "tech") set.seed(1) dat$nPat <- sample(1:200, nrow(dat), replace = TRUE) octab <- xtabs(nPat ~ geo + tech, dat) CX <- complexity(octab)

library(EconGeo) KCI <- as.data.frame(MORc(as.matrix(octab)))`

Warm regards, Wang Keping

n3ssuno commented 11 months ago

Dear Wang, Thank you for writing to me. Let me start by saying that this is a bit of an old project of mine. I'm not currently working on it, even though it works as it's supposed to.

The two packages that you are mentioning should give the same results, indeed. And actually they, more or less, do so. To explain what I mean by "more or less", let's split the explanation into three parts.

First, while EconGeo::MORc() uses the so-called "method-of-reflections version" of the ECI, ReKS::complexity() uses the "eigenvector version" of the ECI. About this difference, feel free to read the glossary of https://atlas.cid.harvard.edu/explore or any publication by Proff Hidalgo or Hausmann from the Atlass (2014) onwards. EconGeo has a function to calculate the ECI using the "eigenvector version", as well. It is called EconGeo::kci() or EconGeo::KCI.r(), depending on the version of EconGeo that you are currently using. So, to make the results comparable, please use this other function. In principles, EconGeo::MORc() should converge to EconGeo::kci(), but using the same method removes one degree of complication from this issue.

Second, the parameters are a bit different between the two packages, so you must be careful about this fact. EconGeo::kci() does not compute the RCA internally, nor does it scale the results (while this is the standard procedure in the literature; please refer to the Atlass also in this case for further details). So, you must run something like scale(EconGeo::kci(octab, rca = TRUE)) to have results that are extremely similar to the ones you get with ReKS::complexity(octab)

Third, the most tricky part. ReKS::complexity() is designed to reproduce the key results of https://github.com/cid-harvard/py-ecomplexity If you are familiar with Python you can check by yourself that this is true (there can be some super small difference due to the C libraries used by your operating system to calculate the eigenvectors). EconGeo::kci() instead at some point rounds the values of the matrix to the fourth decimal point, before calculating the eigenvalues of the matrix (see line 54 of the source code here: https://github.com/PABalland/EconGeo/blob/master/R/KCI.r). If you are wondering why, honestly I've no idea either. However, is this rounding that causes the results to be slightly different. But you will see that the differences are now very small and the two results correlate strongly.

Lastly, let me add a few extra notes. Consider also using this nice R package by Mauricio Vargas: https://pacha.dev/economiccomplexity/ df <- as.data.frame(octab) names(df) <- c("country", "product", "value") bi <- economiccomplexity::balassa_index(df) eci <- economiccomplexity::complexity_measures(bi, method = "eigenvalues") out <- as.data.frame(eci$complexity_index_country) names(out) <- "ECI" head(out) About this last package, consider that the only difference with the results that you can get from ReKS is about the product complexity index (PCI), since Mauricio decided to scale the PCI using a method different from the one originally used in the Atlas.

I hope this helps and that my explanations are clear enough. Otherwise, please feel free to write to me again. Best regards, Carlo