pyronear / pyro-risks

Data science for wildfire risk forecasting and monitoring
https://pyronear.github.io/pyro-risks
Apache License 2.0
25 stars 8 forks source link

feat: Add training pipeline life-cycle management (2/2) #56

Closed jsakv closed 3 years ago

jsakv commented 3 years ago

πŸš€ This PR introduces the second half of the modifications required to orchestrate and automate our training pipelines. The PR resolve issue #51 :

Bonus:

codecov[bot] commented 3 years ago

Codecov Report

Merging #56 (4a1554c) into master (7752c5a) will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
+ Coverage   95.92%   95.93%   +0.01%     
==========================================
  Files          27       27              
  Lines        1055     1058       +3     
==========================================
+ Hits         1012     1015       +3     
  Misses         43       43              
Flag Coverage Ξ”
unittests 95.93% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
pyro_risks/config/datasets.py 100.00% <100.00%> (ΓΈ)
pyro_risks/pipeline/evaluate.py 100.00% <100.00%> (ΓΈ)

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 7752c5a...4a1554c. Read the comment docs.

GHCamille commented 3 years ago

Thanks for the PR 😍 Quick question : how do we handle this google drive, do we need personal credentials ? how do we access it ? I tried clicking on the URL but no chance so far

jsakv commented 3 years ago

Thanks for the PR 😍 Quick question : how do we handle this google drive, do we need personal credentials ? how do we access it ? I tried clicking on the URL but no chance so far

Thanks, @GHCamille. The google drive is handled via a distinct Gmail address with a

The files on the drive can be accessed directly, but DVC hashes them. The process for accessing the files and auditing them once the PR will be merge will be the following:

Enter verification code: # <- enter resulting code

- Downloading the oauth2 credentials and storing them outside of the repository ⚠️ 
- Declaring Credentials with 
```sh 
dvc remote modify artifacts-registry gdrive_service_account_json_file_path path/to/file.json

Since the goal is to deploy the model to production continuously (see below) so we shouldn't directly push the dataset on the drive and should not use dvc push and dvc updatecommands 🚨

GHCamille commented 3 years ago

Works like a charm, thanks. Let's just add here, as a record to first type : dvc remote add -d registry gdrive://1fyD6xyuWWhyjNPoCiCazKshp0yJ8xKs9 dvc pull -r registry In order to create credentials