Open peterdudfield opened 2 years ago
Would be interesting to get you views on this @JackKelly and @jacobbieker?
My feeling would be to start with simple i.e 2. Then, when we need to move this to the database (1.)
Yeah, I think easy first and then building up as we need it. I'd second your plan.
Agreed! A CSV sounds good to me! Thanks for thinking about the alternatives!
We need the followin metadata for both training and predictions of the ML data
Essenitall we need a map from provider id and provider too ocf_id and capacity
We could do this in a number of different ways
1. Database + API
Add
ocf_id
as a column to the database (this could go in the pv_system table). We would need to make sure that the development and production databases have the same value. Then we could add an endpoint to the API, where (provider
,provider_id
) --> (ocf_id
,capacity
). This information is not publicity sensitive, so we are ok with security here.We perhaps would want a wrapper function around the api endpoint that we can easily use (this could be in nowcasting_dataset)
This solution is good because
This solution is bad because:
2. CSV
We could add a CSV to either nowcasting_dataset, or pv consumer with the following 4 rows: (provider, id, ocf_id, capacity). Then we could write a function to to go from (
provider
,provider_id
) --> (ocf_id
,capacity
) very easily. I think this all sits quite well in nowcasting_dataset.This is good because:
This solution is bad because:
3. Cloud CSV
Like 2, but the CSV could be in the cloud. This means we don't have to worry about installing any extra repos.
This is good because:
This solution is bad because:
4. Hybrid
ocf_id
to databaselinks: