pachadotdev / economiccomplexity

A wrapper of different indices and networks commonly used in Economic Complexity
https://pacha.dev/economiccomplexity/
GNU General Public License v3.0
39 stars 12 forks source link

Data Source #10

Closed JohnCoene closed 4 years ago

JohnCoene commented 4 years ago

This probably stems from a lack of knowledge of econometrics. I was wondering where one might find read-world data that could be used with the package.

I am asking because I see that all the built-in datasets are sparse matrices which, as far as I know, very few real-world data sources would return such formats.

class(world_trade_avg_1998_to_2000)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"

If I am mistaken, could you please point me in the direction of such a data source? If I am not wrong and that such data is rarely available out there in such a format, I believe the package should include:

  1. A demo of how one might go about using it with real-world data, e.g. tradestatistics, which you also built I believe.
  2. If the turning the "usual" real-world dataset to a sparse matrix is not easy, perhaps the pacakge should also include a convenience function(s) to do so.
pachadotdev commented 4 years ago

Thanks! the package internally detects if the input is a data.frame, in which case it aggregates the data grouping by "country" and "product" and then converts to a matrix of CxP (C stands for countries, P stands for products). The demo datasets were actually obtained from tradestatistics.io but using SITC classification to follow what the original articles and the Atlas did in the past. It's probably better to use HS classification.

JohnCoene commented 4 years ago

Ok I see, using balassa_index

pachadotdev commented 4 years ago

I switched the demo datasets to data.frame to do exactly the same (the functions already converted data.frames to matrices, even doing the pertinent aggregation), in the end it made the testing scripts a bit shorter