This PR fixes issue #69 which relates to installer certification dates (effective_to is the installer certification end date and effective_from is the installer certification start date):
When effective_from is non-missing and effective_to is missing, it means that the certification hasn’t ended (that's how the data comes, as raw). Before this PR, we were filling these missing values with the date the data was sent by MCS (which was our best guess). This processing lived in merge_proc_datasets.py (the script that creates an outer merge between EPC, MCS installations and MCS installers for the HPMT project), but I now moved it these changes to process_historical_mcs_installers.py, where we process installer data, which makes more sense;
Additionally, I am also enhancing effective_to so that we check whether effective_to date happens before the last commissioning_date for each pair (installer, certification number). If so, then the last commissioning date is actually a better guess for effective_to than the date MCS sent us the installer data. This change is being applied to all data, not only when effective_to is initially missing. In a number of cases (small %), we are actually changing the effective_to date when it wasn’t initially missing. Is it possible for a heat pump to be commissioned after the certification has ended? Might this be a typo in effective_to or shouldn’t we be changing the values of effective_to in the first place when the value is not missing? I could have this enhancement only for originally missing values. Let me know what you think :)
closes #69
Instructions for Reviewer(s)
Review
Dear @ch-williamson,
It would be great if you could double check these changes make sense and double check the changes to scripts:
clone this repo: git clone git@github.com:nestauk/asf_core_data.git
checkout to the correct branch: git checkout 69_improvements_installer_effective_to
Run make install;
Run direnv allow;
Activate the conda enviroment: conda activate asf_core_data;
Set your API credentials as environment variable, i.e run export COMPANIES_HOUSE_API_KEY="ADD_YOUR_API_KEY_HERE" in the command line and replace ADD_YOUR_API_KEY_HERE with your API key credentials.
Checklist:
[ ] I have refactored my code out from notebooks/
[ ] I have checked the code runs
[ ] I have tested the code
[ ] I have run pre-commit and addressed any issues not automatically fixed
[ ] I have merged any new changes from dev
[ ] I have documented the code
[ ] Major functions have docstrings
[ ] Appropriate information has been added to READMEs
[ ] I have explained the feature in this PR or (better) in output/reports/
Description
This PR fixes issue #69 which relates to installer certification dates (
effective_to
is the installer certification end date andeffective_from
is the installer certification start date):effective_from
is non-missing andeffective_to
is missing, it means that the certification hasn’t ended (that's how the data comes, as raw). Before this PR, we were filling these missing values with the date the data was sent by MCS (which was our best guess). This processing lived inmerge_proc_datasets.py
(the script that creates an outer merge between EPC, MCS installations and MCS installers for the HPMT project), but I now moved it these changes toprocess_historical_mcs_installers.py
, where we process installer data, which makes more sense;effective_to
so that we check whethereffective_to
date happens before the lastcommissioning_date
for each pair (installer, certification number). If so, then the last commissioning date is actually a better guess foreffective_to
than the date MCS sent us the installer data. This change is being applied to all data, not only when effective_to is initially missing. In a number of cases (small %), we are actually changing theeffective_to
date when it wasn’t initially missing. Is it possible for a heat pump to be commissioned after the certification has ended? Might this be a typo ineffective_to
or shouldn’t we be changing the values ofeffective_to
in the first place when the value is not missing? I could have this enhancement only for originally missing values. Let me know what you think :)closes #69
Instructions for Reviewer(s)
Review
Dear @ch-williamson,
It would be great if you could double check these changes make sense and double check the changes to scripts:
asf_core_data/pipeline/mcs/process/process_historical_mcs_installers.py
asf_core_data/pipeline/mcs/generate_mcs_data.py
asf_core_data/pipeline/data_joining/merge_proc_datasets.py
@sqr00t / @Jack-Vines - tagging you FYI
Setup
In case you want/need to run anything:
git clone git@github.com:nestauk/asf_core_data.git
git checkout 69_improvements_installer_effective_to
make install
;direnv allow
;conda activate asf_core_data
;export COMPANIES_HOUSE_API_KEY="ADD_YOUR_API_KEY_HERE"
in the command line and replaceADD_YOUR_API_KEY_HERE
with your API key credentials.Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
soutput/reports/