ropensci / tradestatistics

R package to access Open Trade Statistics API
https://docs.ropensci.org/tradestatistics
Apache License 2.0
76 stars 14 forks source link

Problem retrieving tariff data #64

Closed ethai98 closed 2 years ago

ethai98 commented 2 years ago

Hello,

I'm having an issue with getting tariff data using the tradestatistics package with the following code

test <- ots_create_tidy_data(years=2004,reporters="usa", partners = "can", commodities = "all", table = "tariffs", max_attempts = 30)

There are three issues in the outputs.

  1. The code ran successfully, but the output includes every trade partner the US has instead of just Canada as specified.
  2. In theory, reporter-partner-tariff should have about the same number product lines, but this is not the case. If I run table(test$partner_iso), it shows that each trade partner to the US have different number of product code observations -- I'm not sure if this is what the WITS API data provides or if it is something about the package.
  3. The output dataset does not have a column(s) on tariff levels. The code I ran above yielded about 118141 obs of 8 variables, but none of the variables details the tariff for the product line per reporter-partner pair.

Thank you in advance for your help on this!

-Eric

pachadotdev commented 2 years ago

hi @ethai98 let me check this and I'll be back in the meanwhile, I can email you a zipped csv with the data, my email is in my profile

ethai98 commented 2 years ago

Hi @pachadotdev ,

Thank you! I will need a lot of tariff data for every possible reporter-partner product. I do not want to impose that on you. Hopefully, with the code bug fixed, I'll be able to automate all the downloads. Thank you!

Best regards, Eric Thai

pachadotdev commented 2 years ago

@ethai98 I have that at hand, please email me, because I don't have the time to fix the API until next week, but I have the dataset right here

pachadotdev commented 2 years ago

@ethai98 I found the problem! it's easy to solve, I'll keep you posted

pachadotdev commented 2 years ago

@ethai98 I solved it, at some point I changed the table in the API and I forgot to reflect the changes in the package you can now do ots_create_tidy_data(years=2004,reporters="usa", partners = "can", commodities = "all", table = "tariffs") and it shall work

here's a preview Screenshot from 2022-08-11 23-12-14

in any case, remember that I have all of this in google drive as a backup, so if you need lots of pairs, email me and I can send you that, or I can give you direct SQL access (the API is meant to be easy, but not necessarily flexible, even when I tried)

you'll need to install the package from this repository to see the changes

ethai98 commented 2 years ago

Hi Pachá,

Thank you for the quick turnaround on this request! It works!

Best regards, Eric

ethai98 commented 2 years ago

Hi Pachá,

Just checking on another potential issue. If I run: usa_all_2008 <- ots_create_tidy_data(years=2008,reporters="usa", partners = "abw", commodities = "all", table = "tariffs")

And check the number of tariff information available on each of the partner countries using: table(usa_all_2008$partner_iso)

It shows an unbalanced tariff observation at the commodity level.

Screen Shot 2022-08-11 at 11 44 29 PM

Usually, there should be about 4000-5000 tariff lines per reporter-partner pair, but many pairs run short of 4000. As mentioned above, countries without preferential tariffs should receive MFN tariffs if they are WTO members. As such, there should be around the same amount of commodity line per partner country with either PRF or MFN tariffs, whichever is lower (due to preferential treatment from trade agreements). Not sure whether it's something with the package or something inherently wrong with the API data that causes the imbalance in tariff lines. Could you check?

Best regards, Eric Thai

pachadotdev commented 2 years ago

hi @ethai98

this is something intentional!

the dataset contains many 0's that I've skipped in the DB

for example

library(tradestatistics)
library(dplyr)

usa_all_2008 <- ots_create_tidy_data(years=2008,reporters="usa", 
                                     partners = "all", commodities = "all", table = "tariffs")

usa_all_2008 %>% 
  group_by(reporter_iso, partner_iso) %>% 
  count()

# A tibble: 214 × 3
# Groups:   reporter_iso, partner_iso [214]
   reporter_iso partner_iso     n
   <chr>        <chr>       <int>
 1 usa          abw            40
 2 usa          afg            91
 3 usa          ago            26
 4 usa          aia            19
 5 usa          alb            86
 6 usa          and            29
 7 usa          ant           169
 8 usa          are           775
 9 usa          arg          1397
10 usa          arm           138
# … with 204 more rows

versus

library(arrow)

usa_all_2008 <- open_dataset("~/github/un_escap/tradestatistics-database-postgresql/tariffs/2008",
                             partitioning = c("reporter_iso", "section_code"))

> usa_all_2008 %>% 
+   filter(reporter_iso == "usa") %>% 
+   collect() %>% 
+   group_by(reporter_iso, partner_iso) %>% 
+   count()

# A tibble: 227 × 3
# Groups:   reporter_iso, partner_iso [227]
   reporter_iso partner_iso     n
   <chr>        <chr>       <int>
 1 usa          abw          5195
 2 usa          afg          5179
 3 usa          ago          5193
 4 usa          aia          5158
 5 usa          alb          5158
 6 usa          and          5131
 7 usa          ant          5195
 8 usa          are          5131
 9 usa          arg          5158
10 usa          arm          5158
# … with 217 more rows

let me know if it helps to add all those 0's, here what I do is to use a left join with the actual trade table, because not all countries export all products

ethai98 commented 2 years ago

Hi Pachá,

Ah I see -- that's an efficient way to retrieve data. I wasn't aware of this beforehand; I think additional information on the documentation would help clarify. I think it comes down to personal preference. For me, a tariff line without tariff conveys important information -- I think it would help if the package could provide an option for that and whether the zero tariff is from "mfn" or "prf" as it already does. It would be efficient to use this data to gather tariff information on all possible pairs and whether the tariff data is based on a preferential trade agreement (prf) or from the WTO MFN status (mfn).

Thanks! Best regards, Eric Thai

pachadotdev commented 2 years ago

Hi Pachá,

Ah I see -- that's an efficient way to retrieve data. I wasn't aware of this beforehand; I think additional information on the documentation would help clarify. I think it comes down to personal preference. For me, a tariff line without tariff conveys important information -- I think it would help if the package could provide an option for that and whether the zero tariff is from "mfn" or "prf" as it already does. It would be efficient to use this data to gather tariff information on all possible pairs and whether the tariff data is based on a preferential trade agreement (prf) or from the WTO MFN status (mfn).

Thanks! Best regards, Eric Thai

thanks a lot! I added the detail to api.tradestatistics.io/tables, and now I'll add it to the package