Open ledell opened 8 years ago
@ledell can you open this in https://github.com/ropensci/unconf16/issues
Sounds awesome. Has anything come of it?
@Ninoninoninonino I ended up working on a different project at the unconf last year and I haven't worked on this in over a year. However, I have a half-finished R package for this. Are you interested in working on it or using it? I will paste the README below that documents the status of development the package.
The opencorporates R package is an R interface to the opencorporates API. Using this package, you can access an open database containing information about more than 92 million companies, worldwide.
The package can be used without an API key, however there are usage limits that restrict the results. To sign up for an API key, register for an account here.
Here is a list of the multiple endpoints that api.opencorporates.com offers:
The calls are grouped into a series of R functions, with self-explanitory names that map to R functions. Here is a complete list of these functions.
API Method Call | R function |
---|---|
GET versions | get_versions() |
GET companies/:jurisdiction_code/:company_number | get_companies(jurisdiction_code, company_number) |
GET companies/search | get_companies_search() |
GET companies/:jurisdiction_code/:company_number/filings | get_companies_filings(jurisdiction_code, company_number) |
GET companies/:jurisdiction_code/:company_number/network | get_companies_network(jurisdiction_code, company_number) |
GET companies/:jurisdiction_code/:company_number/statements | get_companies_statements(jurisdiction_code, company_number) |
GET companies/:jurisdiction_code/:company_number/data | get_companies_data(jurisdiction_code, company_number) |
GET officers/search | |
GET officers/:id | |
GET corporate_groupings/:name | |
GET corporate_groupings/search | |
GET filings/:id | |
GET data/:id | |
GET statements/:id | |
GET placeholder/:id | |
GET placeholders/:id/network | |
GET placeholders/:id/statements | |
GET jurisdictions | get_jurisdictions() |
GET jurisdictions/match | |
GET industry_codes | get_industry_codes() |
GET industry_codes/:code_scheme_id | |
GET industry_codes/:code_scheme_id/:code | |
GET account_status |
Below are examples of using each of the functions in the R package, along with a snapshot of the results.
get_versions()
Description:
This returns the current version of the API and supported versions. If a specific version has been requested it also returns the requested version.
Example:
res <- get_versions(api_version = "0.4")
print(res)
Results:
$versions
$versions$current_version
[1] "0.4.1"
$versions$supported_versions
$versions$supported_versions[[1]]
[1] "0.2"
$versions$supported_versions[[2]]
[1] "0.3"
$versions$supported_versions[[3]]
[1] "0.3.1"
$versions$supported_versions[[4]]
[1] "0.3.2"
$versions$supported_versions[[5]]
[1] "0.4"
$versions$supported_versions[[6]]
[1] "0.4.1"
$versions$requested_version
[1] "0.4"
get_companies_search()
Description:
This returns a collection of companies whose name matches the given search term (submitted as :q in the query parameters).
Example:
res <- get_companies_search(query = "barclays+bank",
api_version = "0.4",
raw = FALSE)
head(res)[,1:5]
Results:
name company_number jurisdiction_code incorporation_date dissolution_date
1 BARCLAYS BANK 0870373575 be 2004-12-03 NULL
2 BARCLAYS BANK PR34 mt 2005-04-28 NULL
3 BARCLAYS BANK ( DOMINION COLONIAL AND OVERSSEAS) ARC35A ug NULL NULL
4 BARCLAYS BANK (HONG KONG NOMINEES) LIMITED 0040910 hk 1974-11-26 NULL
5 BARCLAYS BANK (LONDON AND INTERNATIONAL) LIMITED 00747985 gb 1963-01-24 2008-05-13
6 BARCLAYS BANK (SINGAPORE NOMINEES) PTE LTD 198003638Z sg NULL NULL
get_companies()
Description:
This returns the core data for the given company. The jurisdiction code is the code for the jurisdiction which registered the company. If this is a country it is simply the two-letter ISO code for that country, e.g. Spain = es, United Kingdom = gb. If this is a state or province it is an underscore version of the ISO 3166-2 code for the jurisdiction, eg. Michigan in the US is us_mi.
Example:
res <- get_companies_coredata(jurisdiction_code = "gb",
company_number = "00102498",
api_version = "0.4",
raw = FALSE)
Results:
> t(res$meta)
[,1]
name "BP P.L.C."
company_number "00102498"
jurisdiction_code "gb"
incorporate_date "1909-04-14"
dissolution_date NA
company_type "Public Limited Company"
registry_url "http://data.companieshouse.gov.uk/doc/company/00102498"
branch_status NA
inactive "FALSE"
current_status "Active"
created_at "2010-10-21T18:20:50+00:00"
updated_at "2015-12-19T21:38:27+00:00"
retrieved_at "2015-12-09T12:10:45+00:00"
opencorporates_url "https://opencorporates.com/companies/gb/00102498"
> head(res$data)
id title data_type description opencorporates_url
1 25248386 International Trademark Registration WipoTrademark https://opencorporates.com/data/25248386
2 34300047 International Trademark Registration WipoTrademark https://opencorporates.com/data/34300047
3 34300048 Company Address CompanyAddress 1 St James's Square, London SW1Y 4PD, GB https://opencorporates.com/data/34300048
4 2204579 International Trademark Registration WipoTrademark https://opencorporates.com/data/2204579
5 9788777 International Trademark Registration WipoTrademark https://opencorporates.com/data/9788777
6 1999276 International Trademark Registration WipoTrademark https://opencorporates.com/data/1999276
> head(res$filings)[,1:2]
id title
1 230543795 Appointment of director
2 230543791 Appointment of director
3 230543792 Notice of sale or transfer of treasury shares by a public limited company (PLC)
4 230543794 Notice of sale or transfer of treasury shares by a public limited company (PLC)
5 230543793 Termination of appointment of director
6 228615959 Return of allotment of shares
> head(res$officers)
id name position uid start_date end_date opencorporates_url occupation inactive current_status
1 206304786 HANNAH ASHDOWN secretary 2012-02-02 https://opencorporates.com/officers/206304786 FALSE
2 206304801 JENS BERTELSEN secretary 2012-02-02 https://opencorporates.com/officers/206304801 FALSE
3 206304814 PAULA JEAN CLAYTON secretary 1999-08-01 2001-07-01 https://opencorporates.com/officers/206304814 TRUE
4 206304831 RICHARD CHARLES GRAYSON secretary 1992-05-10 1994-10-01 https://opencorporates.com/officers/206304831 TRUE
5 206304849 JUDITH CHRISTINE HANRATTY secretary 1994-10-01 2003-07-24 https://opencorporates.com/officers/206304849 TRUE
6 206304870 DAVID JOHN JACKSON secretary 2003-07-24 https://opencorporates.com/officers/206304870 FALSE
get_companies_filings()
Description:
This returns the statutory filings for the given company.
Example:
res <- get_companies_filings(jurisdiction_code = "gb",
company_number = "00102498",
api_version = "0.4",
raw = FALSE)
head(res)[,1:2]
Results:
id title
1 230543795 Appointment of director
2 230543791 Appointment of director
3 230543792 Notice of sale or transfer of treasury shares by a public limited company (PLC)
4 230543794 Notice of sale or transfer of treasury shares by a public limited company (PLC)
5 230543793 Termination of appointment of director
6 228615959 Return of allotment of shares
get_companies_network()
Description:
(NOT COMPLETE) This returns the immediate 'computed corporate network' for the given company as a set of control relationships (i.e. one company is thought to control or influence another company). This is the same data you can see on a company's network page on the main OpenCorporates site.
Example:
res <- get_companies_network(jurisdiction_code = "gb",
company_number = "02263951",
api_version = "0.4",
raw = FALSE)
Results:
Note that OpenCorporates only has network data for a small proportion of the 50,000,000-plus companies currently in the OpenCorporates database.
get_companies_statements()
Description:
This returns the statements associated with each company. A statement is a purported 'statement of fact' from a source (a public record or a user). For example, subsidiary statement may have been parsed from a filing at the US Securities And Exchange Commission, or a user may have made a statement that one company is a parent of another.
Example:
res <- get_companies_statements(jurisdiction_code = "gb",
company_number = "00102498",
api_version = "0.4",
raw = FALSE)
This produces a warning message.
# Warning message:
# In get_company_statements(jurisdiction_code = "gb", company_number = "00102498", :
# Without an opencorporates API key, only the companies on the first page of results are returned (30 records max).
# Number of pages remaining: 1337
# Number of results remaining: 40094
Note: The warning above indicates that more results were available, but only the first 30 were returned. Returning all the results (by looping through each page of results) is is a "to-do" item.
Results:
> head(res)[1:4]
id data_type opencorporates_url start_date
1 16098371 subsidiary_relationship https://opencorporates.com/statements/16098371 2013-07-06
2 16098372 subsidiary_relationship https://opencorporates.com/statements/16098372 2013-07-06
3 16098375 subsidiary_relationship https://opencorporates.com/statements/16098375 2013-07-06
4 16098377 subsidiary_relationship https://opencorporates.com/statements/16098377 2013-07-06
5 16098381 subsidiary_relationship https://opencorporates.com/statements/16098381 2013-07-06
6 16098414 subsidiary_relationship https://opencorporates.com/statements/16098414 2013-07-06
get_companies_data()
TO DO.
get_officers_search()
TO DO.
get_officers()
TO DO.
get_corporate_groupings()
TO DO.
get_corporate_groupings_search()
TO DO.
get_filings()
TO DO.
get_data()
TO DO.
get_statements()
TO DO.
get_placeholder()
TO DO.
get_placeholders_network()
TO DO.
get_placeholders_statements()
TO DO.
get_jurisdictions()
Description:
This returns the list of all the jurisdictions we know about (not all of which the opencorporates database has companies in), as well as the jurisdiction code for the jurisdiction.
Example:
res <- get_jurisdictions(api_version = "0.4")
head(res)
Results:
code name country full_name
1 ad Andorra Andorra Andorra
2 ae_az Abu Dhabi United Arab Emirates Abu Dhabi (United Arab Emirates)
3 ae_du Dubai United Arab Emirates Dubai (United Arab Emirates)
4 af Afghanistan Afghanistan Afghanistan
5 ag Antigua and Barbuda Antigua and Barbuda Antigua and Barbuda
6 ai Anguilla Anguilla Anguilla
get_jurisdictions_match()
TO DO.
get_industry_codes()
Description:
From v0.4 OpenCorporates has moved to a new way of representing industry codes (previously we only catered for UK SIC codes), and we can now handle a wide variety of different industry codes, including US NAICS codes and EU NACE codes (and their derivatives). Where a company register makes available the industry codes, we now store that code, together with the code scheme which it belongs to, e.g. For this Belgian company, the industry code consists of the code scheme (in this case be_nace_2008, which represents the NACE-BEL 2008 code scheme) and the code 66191 (which in NACE-BEL 2008 is the code for 'Agenten en makelaars in bankdiensten'). This can be represented as a uid (in this case 'be_nace_2008-66191') to make searching by industry codes consistent and straightforward.
Example:
res <- get_industry_codes(api_version = "0.4")
head(res)[1:3]
Results:
id name jurisdiction_code
1 uk_sic_1992 UK SIC Classification 1992 gb
2 uk_sic_2003 UK SIC Classification 2003 gb
3 uk_sic_2007 UK SIC Classification 2007 gb
4 isic_4 UN ISIC Rev 4 <NA>
5 eu_nace_2 European Community NACE Rev 2 <NA>
6 eu_nace_11 European Community NACE Rev 1.1 <NA>
get_industry_codes_scheme_id()
TO DO.
get_industry_codes_scheme_id_code()
TO DO.
get_account_status()
TO DO.
Hi Erin, this project is released? can i download library or code? thanks!
Hi @ErickGB I never had time to release it, but I have the code somewhere. Send an email and I will send you want I have if you're still interested (sorry I just saw this note now...).
Hi @ledell , mine sharing the codes to wheemur@gmail.com? thanks!
Hi @ledell I'd also be interested in the code, if you don't mind sharing it: emmanuel.freudenthal@gmail.com thanks!
Hi @ledell, I would also like to get access to the code - do you mind sharing it? michele.castiglioni@eui.eu - Thanks a lot!
I started working on an R API for the OpenCorporates API, which is the "world's largest open database of companies." This includes all sorts of interesting metadata about companies, including company officers and network data about which companies own other companies.
I am not sure this if this fits the definition of "open science data" and falls under the auspices of the rOpenSci project, but I thought I'd throw it up here anyway just in case. The API is well designed and well documented and so it's fairly straightforward to create an R wrapper for it.