mccgr / abn_lookup

Code for creating tables containing the ABN's for companies registered with the Australian Business Register on the ABN lookup website (https://abr.business.gov.au/)
5 stars 3 forks source link

ABN/ACN work: make table scraped from the ABR #3

Closed bdcallen closed 4 years ago

bdcallen commented 5 years ago

@iangow This issue is for making the table using the ABR data that we downloaded yesterday, to complement the data scraped from ASIC

iangow commented 5 years ago

Ideally the code would download the data (that we've already downloaded) so that it's easy to run updates.

bdcallen commented 5 years ago

@iangow I am going to close this issue since we now have the new repository, abn_lookup, created for this task. I will re-open the issue there.

iangow commented 5 years ago

@bdcallen I moved the issue here.

bdcallen commented 5 years ago

@iangow Just confirmation that the program worked in scraping the initial tables

bdcallen@igow-z640:~/abn_lookup$ Rscript process_abn_lookup_xml.R
Loading required package: bitops

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Attaching package: ‘rvest’

The following object is masked from ‘package:readr’:

    guess_encoding

Attaching package: ‘purrr’

The following object is masked from ‘package:rvest’:

    pluck

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date

[1] TRUE
[1] TRUE
[1] "Processing file 1"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 337078 entries into abn_lookup.trading_names"
[1] "Successfully processed 543 entries into abn_lookup.dgr"
[1] "Processing file 2"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 331121 entries into abn_lookup.trading_names"
[1] "Successfully processed 834 entries into abn_lookup.dgr"
[1] "Processing file 3"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 336632 entries into abn_lookup.trading_names"
[1] "Successfully processed 556 entries into abn_lookup.dgr"
[1] "Processing file 4"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 331431 entries into abn_lookup.trading_names"
[1] "Successfully processed 504 entries into abn_lookup.dgr"
[1] "Processing file 5"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 336659 entries into abn_lookup.trading_names"
[1] "Successfully processed 499 entries into abn_lookup.dgr"
[1] "Processing file 6"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 334497 entries into abn_lookup.trading_names"
[1] "Successfully processed 438 entries into abn_lookup.dgr"
[1] "Processing file 7"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 338265 entries into abn_lookup.trading_names"
[1] "Successfully processed 566 entries into abn_lookup.dgr"
[1] "Processing file 8"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 331791 entries into abn_lookup.trading_names"
[1] "Successfully processed 532 entries into abn_lookup.dgr"
[1] "Processing file 9"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 332811 entries into abn_lookup.trading_names"
[1] "Successfully processed 847 entries into abn_lookup.dgr"
[1] "Processing file 10"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 338042 entries into abn_lookup.trading_names"
[1] "Successfully processed 519 entries into abn_lookup.dgr"
[1] "Processing file 11"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 336824 entries into abn_lookup.trading_names"
[1] "Successfully processed 477 entries into abn_lookup.dgr"
[1] "Processing file 12"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 335303 entries into abn_lookup.trading_names"
[1] "Successfully processed 851 entries into abn_lookup.dgr"
[1] "Processing file 13"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 331809 entries into abn_lookup.trading_names"
[1] "Successfully processed 479 entries into abn_lookup.dgr"
[1] "Processing file 14"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 336988 entries into abn_lookup.trading_names"
[1] "Successfully processed 531 entries into abn_lookup.dgr"
[1] "Processing file 15"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 335452 entries into abn_lookup.trading_names"
[1] "Successfully processed 547 entries into abn_lookup.dgr"
[1] "Processing file 16"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 338900 entries into abn_lookup.trading_names"
[1] "Successfully processed 485 entries into abn_lookup.dgr"
[1] "Processing file 17"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 332144 entries into abn_lookup.trading_names"
[1] "Successfully processed 575 entries into abn_lookup.dgr"
[1] "Processing file 18"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 333580 entries into abn_lookup.trading_names"
[1] "Successfully processed 568 entries into abn_lookup.dgr"
[1] "Processing file 19"
[1] "Successfully processed 720900 entries into abn_lookup.abns"
[1] "Successfully processed 336553 entries into abn_lookup.trading_names"
[1] "Successfully processed 492 entries into abn_lookup.dgr"
[1] "Processing file 20"
[1] "Successfully processed 715761 entries into abn_lookup.abns"
[1] "Successfully processed 330657 entries into abn_lookup.trading_names"
[1] "Successfully processed 521 entries into abn_lookup.dgr"
[1] TRUE
bdcallen commented 4 years ago

@iangow Given we now have a program that has worked to download the data into the database in around 20 minutes successfully a few times now, I'm closing this.