Write Source Code - Githubissues

andymithamclarke commented 6 months ago

@Gtregon to coordinate the reference data team @Adenikemie + Alex in creating the training dataset for the new morphological infomality model based on the reference data created in #9

The task involves using 3-point reference data to generate training data from the following datasets:

Satellite imagery (sentinel)
Google Buildings ( Building Density )
Irregular Layout
Road connectivity (TBC) **
Population density (TBC) **

The datasets marked with an ** are new to the modelling process and will require some time to familiarise.

This task can be completed when we have a training dataset for the morphological informality model - and this dataset is referenced from within the Github and is likely stored in an accessible place like CRIB.

Gtregon commented 5 months ago

UPDATE:

The population, building dens and road datasets will be used to generate reference data for the deep learning model. We no longer need to do any additional preparation for the DL model, as the GW team have formalised a workflow that allows any team member to input reference data and generate outputs using their model.

This issue can be closed when the reference data has been forwarded to the GW team (Ryan etc).

Gtregon commented 5 months ago

UPDATE:

WP2/3 team have now agreed that a rule-based model would be the best approach to generate initial high, medium and low classifications of morphological informality within our pilot cities.

@Gtregon will therefore write and develop source code to ingest the reference data developed in #15 #16 #17 and deploy a rule based model.

This issue can be closed when source code has been developed to run a rule based model using the reference datasets.

gielinkg commented 4 months ago

✅ Definition of Done

[ ] 1. Define acceptance criteria.
[ ] 2. Assess the need for a review process. If a review process is required, the issue states: i. Who is involved in the review? ii. When will the review take place? iii. Who is responsible for taking on the feedback? iv. What additional tasks are involved and are they visible on the backlog?
- [ ] 3. Make progress and post updates.
[ ] 4. Check off completed acceptance criteria.
[ ] 5. Post links to digital outputs.
[ ] 6. Note the value added to the product.

Gtregon commented 3 months ago

Update: writing of the source code and running of the model will be combined within the same issue as these tasks will be completed simultaneously i.e. as source code is written the step analysis/running will be performed.

Updated set of tasks to be complete within this issue:

@Gtregon to combine all covarite data into one complete dataset. Atm, the covariates exist in gpkg's and are separate to one another. A csv file will be generated the joins all the relevant datasets together so that all covariates are in the same dataset and can be used within the ML model.
Jupyter notebook will be used to develop python source code using Numpy, pandas/geopandas will be used to process the data and scikitlearn will be used to deploy the RF model and calculate shap values.
Reference datasets will be used to generate training and testing datasets for high, medium and low MI. It is likely Kano will be used as the first model due to its higher number of unique samples in reference data (829 unique samples vs 202 in Lagos). 80% of the data will be used for training and 20% for testing. This is usually recommended ot be 70-30 split but due to the limited amount of reference data, a higher quantity will be kept for training.

andymithamclarke commented 2 months ago

Closing as source code is written:

@Gtregon upload to GH repository
No further review required.
Value Addded: We have code to run the model

urbanbigdatacentre / ideamaps-models

Write Source Code #14