open-innovations / true-north

Repo for the True North Data Microsite
https://open-innovations.github.io/true-north/
1 stars 1 forks source link

Update True North dashboard #37

Closed luke-strange closed 3 weeks ago

luke-strange commented 1 month ago
luke-strange commented 1 month ago

Questions I have for Brabners:

1) Are you planning to update the questionnaire’s company sizes to match standard definitions? If so, you’d either need to manually correct all the previous entries, or maybe ask all members to fill out the new questionnaire. Perhaps this is something we could do at the launch event on 25th June?

2) Can you export the data as a .csv file? These can still be opened / edited in Excel.

3) We have a dropbox type thing that we can set you up an account on. We can connect this to the website with some processing code to work out the stats we show on the dashboard. The person who can do this is currently on holiday, but when they’re back I’ll ask them to set up an account for you.

4) What are the most important metrics you want to show? Discussed adding a “number of unique organisations” to show how many businesses true north member’s represent.

luke-strange commented 3 weeks ago

We are getting the xlsx file from them each month via the file share. Ive written a script to clean it.

Now need to write another one to process it and produce stats/ graphs

Add them on the site as visualisations.

luke-strange commented 3 weeks ago

Need to go through what I've done in pipelines/truenorth/true-north.ipynb @slowe tomorrow then we can close this ticket.

luke-strange commented 3 weeks ago

This is now done, but a couple of notes:

1) there are 2 files - clean_members_list.ipynb and analyse_members_list.ipynb which are sufficiently documented using markdown blogs in between code blocks.

2) I chose the "industry" column rather than sector. the sector column had 145 unique values as opposed to 50 in industry. Many of the data points for this are missing and we may need to manually fill them in. However, when looking at the sector column, there were lots of similarities but slight variations due to capital letters or additional words. I am going to attempt some fuzzy matching to find most common words. I will add this as a new ticket.