Closed Yvonne-Han closed 4 years ago
I've already added code (topic_run.py
) for creating the kls_domain
table in se_features
here a6072d2, followed by another commit to replace topic_functions with the functions in the new package.
@iangow I've checked on a smaller sample (n=3 files) and the code should work fine. However, I want to keep this issue open until we run topic_run.py
on all calls in StreetEvents (probably over next weekend?)
Running topic_run.py
now (2020-05-23 22:38:37 AEST).
@iangow I'm closing this issue now. See below for a preview of se_features.kls_domain
.
library(dplyr, warn.conflicts = FALSE)
library(DBI)
library(reprex)
pg <- dbConnect(RPostgres::Postgres())
rs <- dbExecute(pg, "SET search_path TO se_features")
rs <- dbExecute(pg, "SET work_mem TO '5GB'")
kls_domain <- tbl(pg, "kls_domain")
kls_domain
#> # Source: table<kls_domain> [?? x 28]
#> # Database: postgres [yanzih1@10.101.13.99:5432/crsp]
#> file_name last_update speaker_number context section market
#> <chr> <dttm> <int> <chr> <int> <lgl>
#> 1 3117755_T 2010-05-26 01:09:59 27 qa 1 FALSE
#> 2 3117755_T 2010-05-26 01:09:59 26 qa 1 FALSE
#> 3 11944816… 2018-11-08 13:30:45 72 qa 1 FALSE
#> 4 3117755_T 2010-05-26 01:09:59 25 qa 1 FALSE
#> 5 11944816… 2018-11-08 13:30:45 71 qa 1 FALSE
#> 6 11944816… 2018-11-08 13:30:45 70 qa 1 FALSE
#> 7 3117755_T 2010-05-26 01:09:59 24 qa 1 FALSE
#> 8 11944816… 2018-11-08 13:30:45 69 qa 1 FALSE
#> 9 3117755_T 2010-05-26 01:09:59 23 qa 1 TRUE
#> 10 11944816… 2018-11-08 13:30:45 68 qa 1 TRUE
#> # … with more rows, and 22 more variables: competition <lgl>,
#> # industry_structure <lgl>, strategic_intent <lgl>,
#> # innovation_and_r_d <lgl>, mode_of_entry <lgl>, business_model <lgl>,
#> # partnerships <lgl>, leadership <lgl>, management_quality <lgl>,
#> # governance <lgl>, disclosure <lgl>, measures <lgl>, customer <lgl>,
#> # brand <lgl>, media <lgl>, advertising <lgl>, corporate_image <lgl>,
#> # financial_performance <lgl>, forecasting <lgl>,
#> # insider_stock_transactions <lgl>, regulation <lgl>,
#> # special_interest_groups <lgl>
kls_domain %>%
select(file_name) %>%
distinct() %>%
count()
#> # Source: lazy query [?? x 1]
#> # Database: postgres [yanzih1@10.101.13.99:5432/crsp]
#> n
#> <int64>
#> 1 474207
Created on 2020-05-24 by the reprex package (v0.3.0)
This issue is for adding code for creating corresponding
topic_measure
tables. For testing the new functions in the new package, see original post in #26_Originally posted by @iangow in https://github.com/iangow/se_features/issues/26#issuecomment-629275167_