ropensci / unconf15

rOpenSci's San Francisco hackathon/unconf 2015
http://unconf.ropensci.org
36 stars 7 forks source link

R API for OpenCorporates data #43

Open ledell opened 8 years ago

ledell commented 8 years ago

I started working on an R API for the OpenCorporates API, which is the "world's largest open database of companies." This includes all sorts of interesting metadata about companies, including company officers and network data about which companies own other companies.

I am not sure this if this fits the definition of "open science data" and falls under the auspices of the rOpenSci project, but I thought I'd throw it up here anyway just in case. The API is well designed and well documented and so it's fairly straightforward to create an R wrapper for it.

sckott commented 8 years ago

@ledell can you open this in https://github.com/ropensci/unconf16/issues

Ninoninoninonino commented 7 years ago

Sounds awesome. Has anything come of it?

ledell commented 7 years ago

@Ninoninoninonino I ended up working on a different project at the unconf last year and I haven't worked on this in over a year. However, I have a half-finished R package for this. Are you interested in working on it or using it? I will paste the README below that documents the status of development the package.

ledell commented 7 years ago

opencorporates R API

The opencorporates R package is an R interface to the opencorporates API. Using this package, you can access an open database containing information about more than 92 million companies, worldwide.

The package can be used without an API key, however there are usage limits that restrict the results. To sign up for an API key, register for an account here.

Method Calls

Here is a list of the multiple endpoints that api.opencorporates.com offers:

The calls are grouped into a series of R functions, with self-explanitory names that map to R functions. Here is a complete list of these functions.

API Method Call R function
GET versions get_versions()
GET companies/:jurisdiction_code/:company_number get_companies(jurisdiction_code, company_number)
GET companies/search get_companies_search()
GET companies/:jurisdiction_code/:company_number/filings get_companies_filings(jurisdiction_code, company_number)
GET companies/:jurisdiction_code/:company_number/network get_companies_network(jurisdiction_code, company_number)
GET companies/:jurisdiction_code/:company_number/statements get_companies_statements(jurisdiction_code, company_number)
GET companies/:jurisdiction_code/:company_number/data get_companies_data(jurisdiction_code, company_number)
GET officers/search
GET officers/:id
GET corporate_groupings/:name
GET corporate_groupings/search
GET filings/:id
GET data/:id
GET statements/:id
GET placeholder/:id
GET placeholders/:id/network
GET placeholders/:id/statements
GET jurisdictions get_jurisdictions()
GET jurisdictions/match
GET industry_codes get_industry_codes()
GET industry_codes/:code_scheme_id
GET industry_codes/:code_scheme_id/:code
GET account_status

Code Examples

Below are examples of using each of the functions in the R package, along with a snapshot of the results.

get_versions()

Description:

This returns the current version of the API and supported versions. If a specific version has been requested it also returns the requested version.

Example:

res <- get_versions(api_version = "0.4")
print(res)

Results:

$versions
$versions$current_version
[1] "0.4.1"

$versions$supported_versions
$versions$supported_versions[[1]]
[1] "0.2"

$versions$supported_versions[[2]]
[1] "0.3"

$versions$supported_versions[[3]]
[1] "0.3.1"

$versions$supported_versions[[4]]
[1] "0.3.2"

$versions$supported_versions[[5]]
[1] "0.4"

$versions$supported_versions[[6]]
[1] "0.4.1"

$versions$requested_version
[1] "0.4"

get_companies_search()

Description:

This returns a collection of companies whose name matches the given search term (submitted as :q in the query parameters).

Example:

res <- get_companies_search(query = "barclays+bank", 
                            api_version = "0.4", 
                            raw = FALSE)
head(res)[,1:5]

Results:

                                              name company_number jurisdiction_code incorporation_date dissolution_date
1                                    BARCLAYS BANK     0870373575                be         2004-12-03             NULL
2                                    BARCLAYS BANK           PR34                mt         2005-04-28             NULL
3 BARCLAYS BANK ( DOMINION COLONIAL AND OVERSSEAS)         ARC35A                ug               NULL             NULL
4       BARCLAYS BANK (HONG KONG NOMINEES) LIMITED        0040910                hk         1974-11-26             NULL
5 BARCLAYS BANK (LONDON AND INTERNATIONAL) LIMITED       00747985                gb         1963-01-24       2008-05-13
6       BARCLAYS BANK (SINGAPORE NOMINEES) PTE LTD     198003638Z                sg               NULL             NULL

get_companies()

Description:

This returns the core data for the given company. The jurisdiction code is the code for the jurisdiction which registered the company. If this is a country it is simply the two-letter ISO code for that country, e.g. Spain = es, United Kingdom = gb. If this is a state or province it is an underscore version of the ISO 3166-2 code for the jurisdiction, eg. Michigan in the US is us_mi.

Example:

res <- get_companies_coredata(jurisdiction_code = "gb", 
                              company_number = "00102498", 
                              api_version = "0.4", 
                              raw = FALSE)

Results:


> t(res$meta)
                   [,1]                                                    
name               "BP P.L.C."                                             
company_number     "00102498"                                              
jurisdiction_code  "gb"                                                    
incorporate_date   "1909-04-14"                                            
dissolution_date   NA                                                      
company_type       "Public Limited Company"                                
registry_url       "http://data.companieshouse.gov.uk/doc/company/00102498"
branch_status      NA                                                      
inactive           "FALSE"                                                 
current_status     "Active"                                                
created_at         "2010-10-21T18:20:50+00:00"                             
updated_at         "2015-12-19T21:38:27+00:00"                             
retrieved_at       "2015-12-09T12:10:45+00:00"                             
opencorporates_url "https://opencorporates.com/companies/gb/00102498"

> head(res$data)
        id                                title      data_type                              description                       opencorporates_url
1 25248386 International Trademark Registration  WipoTrademark                                          https://opencorporates.com/data/25248386
2 34300047 International Trademark Registration  WipoTrademark                                          https://opencorporates.com/data/34300047
3 34300048                      Company Address CompanyAddress 1 St James's Square, London SW1Y 4PD, GB https://opencorporates.com/data/34300048
4  2204579 International Trademark Registration  WipoTrademark                                           https://opencorporates.com/data/2204579
5  9788777 International Trademark Registration  WipoTrademark                                           https://opencorporates.com/data/9788777
6  1999276 International Trademark Registration  WipoTrademark                                           https://opencorporates.com/data/1999276

> head(res$filings)[,1:2]
         id                                                                           title
1 230543795                                                         Appointment of director
2 230543791                                                         Appointment of director
3 230543792 Notice of sale or transfer of treasury shares by a public limited company (PLC)
4 230543794 Notice of sale or transfer of treasury shares by a public limited company (PLC)
5 230543793                                         Termination of appointment of director 
6 228615959                                                   Return of allotment of shares

> head(res$officers)
         id                      name  position uid start_date   end_date                            opencorporates_url occupation inactive current_status
1 206304786            HANNAH ASHDOWN secretary     2012-02-02            https://opencorporates.com/officers/206304786               FALSE               
2 206304801            JENS BERTELSEN secretary     2012-02-02            https://opencorporates.com/officers/206304801               FALSE               
3 206304814        PAULA JEAN CLAYTON secretary     1999-08-01 2001-07-01 https://opencorporates.com/officers/206304814                TRUE               
4 206304831   RICHARD CHARLES GRAYSON secretary     1992-05-10 1994-10-01 https://opencorporates.com/officers/206304831                TRUE               
5 206304849 JUDITH CHRISTINE HANRATTY secretary     1994-10-01 2003-07-24 https://opencorporates.com/officers/206304849                TRUE               
6 206304870        DAVID JOHN JACKSON secretary     2003-07-24            https://opencorporates.com/officers/206304870               FALSE     

get_companies_filings()

Description:

This returns the statutory filings for the given company.

Example:

res <- get_companies_filings(jurisdiction_code = "gb", 
                             company_number = "00102498", 
                             api_version = "0.4", 
                             raw = FALSE)
head(res)[,1:2]

Results:

         id                                                                           title
1 230543795                                                         Appointment of director
2 230543791                                                         Appointment of director
3 230543792 Notice of sale or transfer of treasury shares by a public limited company (PLC)
4 230543794 Notice of sale or transfer of treasury shares by a public limited company (PLC)
5 230543793                                         Termination of appointment of director 
6 228615959                                                   Return of allotment of shares

get_companies_network()

Description:

(NOT COMPLETE) This returns the immediate 'computed corporate network' for the given company as a set of control relationships (i.e. one company is thought to control or influence another company). This is the same data you can see on a company's network page on the main OpenCorporates site.

Example:

res <- get_companies_network(jurisdiction_code = "gb", 
                             company_number = "02263951", 
                             api_version = "0.4", 
                             raw = FALSE)

Results:

Note that OpenCorporates only has network data for a small proportion of the 50,000,000-plus companies currently in the OpenCorporates database.

get_companies_statements()

Description:

This returns the statements associated with each company. A statement is a purported 'statement of fact' from a source (a public record or a user). For example, subsidiary statement may have been parsed from a filing at the US Securities And Exchange Commission, or a user may have made a statement that one company is a parent of another.

Example:

res <- get_companies_statements(jurisdiction_code = "gb", 
                              company_number = "00102498", 
                              api_version = "0.4", 
                              raw = FALSE)

This produces a warning message.

# Warning message:
# In get_company_statements(jurisdiction_code = "gb", company_number = "00102498",  :
#   Without an opencorporates API key, only the companies on the first page of results are returned (30 records max).
# Number of pages remaining: 1337
# Number of results remaining: 40094                              

Note: The warning above indicates that more results were available, but only the first 30 were returned. Returning all the results (by looping through each page of results) is is a "to-do" item.

Results:

> head(res)[1:4]
        id               data_type                             opencorporates_url start_date
1 16098371 subsidiary_relationship https://opencorporates.com/statements/16098371 2013-07-06
2 16098372 subsidiary_relationship https://opencorporates.com/statements/16098372 2013-07-06
3 16098375 subsidiary_relationship https://opencorporates.com/statements/16098375 2013-07-06
4 16098377 subsidiary_relationship https://opencorporates.com/statements/16098377 2013-07-06
5 16098381 subsidiary_relationship https://opencorporates.com/statements/16098381 2013-07-06
6 16098414 subsidiary_relationship https://opencorporates.com/statements/16098414 2013-07-06

get_companies_data()

TO DO.

get_officers_search()

TO DO.

get_officers()

TO DO.

get_corporate_groupings()

TO DO.

get_corporate_groupings_search()

TO DO.

get_filings()

TO DO.

get_data()

TO DO.

get_statements()

TO DO.

get_placeholder()

TO DO.

get_placeholders_network()

TO DO.

get_placeholders_statements()

TO DO.

get_jurisdictions()

Description:

This returns the list of all the jurisdictions we know about (not all of which the opencorporates database has companies in), as well as the jurisdiction code for the jurisdiction.

Example:

res <- get_jurisdictions(api_version = "0.4")
head(res)

Results:

   code                name              country                        full_name
1    ad             Andorra              Andorra                          Andorra
2 ae_az           Abu Dhabi United Arab Emirates Abu Dhabi (United Arab Emirates)
3 ae_du               Dubai United Arab Emirates     Dubai (United Arab Emirates)
4    af         Afghanistan          Afghanistan                      Afghanistan
5    ag Antigua and Barbuda  Antigua and Barbuda              Antigua and Barbuda
6    ai            Anguilla             Anguilla                         Anguilla

get_jurisdictions_match()

TO DO.

get_industry_codes()

Description:

From v0.4 OpenCorporates has moved to a new way of representing industry codes (previously we only catered for UK SIC codes), and we can now handle a wide variety of different industry codes, including US NAICS codes and EU NACE codes (and their derivatives). Where a company register makes available the industry codes, we now store that code, together with the code scheme which it belongs to, e.g. For this Belgian company, the industry code consists of the code scheme (in this case be_nace_2008, which represents the NACE-BEL 2008 code scheme) and the code 66191 (which in NACE-BEL 2008 is the code for 'Agenten en makelaars in bankdiensten'). This can be represented as a uid (in this case 'be_nace_2008-66191') to make searching by industry codes consistent and straightforward.

Example:

res <- get_industry_codes(api_version = "0.4")
head(res)[1:3]

Results:

           id                            name jurisdiction_code
1 uk_sic_1992      UK SIC Classification 1992                gb
2 uk_sic_2003      UK SIC Classification 2003                gb
3 uk_sic_2007      UK SIC Classification 2007                gb
4      isic_4                   UN ISIC Rev 4              <NA>
5   eu_nace_2   European Community NACE Rev 2              <NA>
6  eu_nace_11 European Community NACE Rev 1.1              <NA>

get_industry_codes_scheme_id()

TO DO.

get_industry_codes_scheme_id_code()

TO DO.

get_account_status()

TO DO.

ErickGB commented 4 years ago

Hi Erin, this project is released? can i download library or code? thanks!

ledell commented 4 years ago

Hi @ErickGB I never had time to release it, but I have the code somewhere. Send an email and I will send you want I have if you're still interested (sorry I just saw this note now...).

jungjein commented 4 years ago

Hi @ledell , mine sharing the codes to wheemur@gmail.com? thanks!

emmanuelfreuden commented 4 years ago

Hi @ledell I'd also be interested in the code, if you don't mind sharing it: emmanuel.freudenthal@gmail.com thanks!

mcas-git commented 3 years ago

Hi @ledell, I would also like to get access to the code - do you mind sharing it? michele.castiglioni@eui.eu - Thanks a lot!