Closed bishax closed 2 years ago
@Juan-Mateos: schema currently remains as before. Here would be the time to request any minor schema changes.
E.g. I noticed that get_sector()
has a column data_dump_date
signalling the latest of the three months of Companies House data (the same three months as the Glass data was collected) that the sector assignment was from - this is probably not necessary to know for this project?
Tested and all works. Good to merge with a couple of observations.
* In `get_name()`, what is `name_age_index` for? Worth dropping?
n
is companies name as of n
renames ago (most companies not having renamed themselves)
* What does it mean for a company name to have an invalid date? Do we want to drop those that do? If not, I suppose we should remove the variable.
If you mean on get_name()
then the date is the date that the name became invalid (i.e. it changed from that name to something else).
* Re your query about `data_dump_date` in sector I wasn't sure I understood what explains the difference. I expect it would be fine to drop - 99.8% of observations come from July 2020 in any case!
If they are in May and June but not July then that implies they have been removed from the register for some reason (e.g. dissolved).
I'll add documentation around the above
All fine, gtm
Closes #3