Write GODI report with main findings

morchickit commented 7 years ago

We need to write a final report to justify what we did to achieve the outlined goals.

Final report https://docs.google.com/document/d/1p38Y46tVxbP0TLFLAfv8qDdW2M1tqwiXR6SW-8kGJi4/edit#

Outtakes (contains relevant information): https://docs.google.com/document/d/1uNAln9QxLsOKoLv0SrU-OT2xfT1hXDQVD22V-YN15ss/edit

Folder with all relevant files here: https://drive.google.com/drive/folders/0B5j55T4ZyssBTlhsZEMzN2NJdU0

Raw data for "GODI RESULTS" section are here: https://docs.google.com/spreadsheets/d/17zmn4wgxPaJozY9EjETjoL4ZVsV_T4wu5EYBT4B5MHY/edit#gid=560413061

dannylammerhirt commented 7 years ago

The data characteristics document can be found at: https://github.com/okfn/opendatasurvey/issues/905

NOTES TO INCLUDE IN FINAL REPORT (AND METHODOLOGY PAGE?)

The data characteristics are our unit of analysis. In theory all survey questions need to apply to the entire list of characteristics. Weather forecasts should be open for various types of forecasts.

Problem: Government publishes data in different files. One file is open, but does not have relevant granularity (does not meet all characteristics). Another file is closed, but contains relevant information. Reviewers find different datasets, some are openly licensed, others are not. Only some are in machine-readable formats. What should they do? A systematic approach is needed how we select our unit of analysis from different datasets.

Solution: In these cases the review can be based on two approaches

1) reviewers use a reference dataset that contains all relevant characteristics and answer questions B2 and B4 until B8 (B9 should be commented on) with reference to this dataset. The main point is that the dataset contains all data characteristics. If reviewers have to choose between two or more similar datasets, choose the one that is most applicable to the questions B2, B4-B8. They document their choice in the comment section. In case that something is unclear, reviewers consult the forum.

2) reviewers cannot find a reference dataset, because the data is split into various datasets. In this case, they refer questions B2 and B4-B8 to all datasets. GODI assumes that these datasets are representative for the data characteristics and in the end we want to assess how open this key information is. Example: If one dataset displays votes on bills and is openly licensed, but another one contains transcripts of debates in parliament and is not openly licensed, then the answer to question B7 would be "No".

dannylammerhirt commented 7 years ago

Add a section on how to read our results. This is necessary so people understand our icons and can better interpret how they relate to one another. Prior discussions on the forum show, that people see logical links between our questions, where there are none, necessarily. For example some question how data can be in bulk if it is outdated. The assumption is that bulk should cover a longer time span of data - but this is not the case

dannylammerhirt commented 7 years ago

https://waffle.io/okfn/opendatasurvey/cards/5891ca022e70b12701f57827

dannylammerhirt commented 7 years ago

Analyse if/how our new scoring influenced the ranking of the top 10 countries

dannylammerhirt commented 7 years ago

Analysis of findability:

[ ] check how many URLs were changed by reviewers (how many URLs are actually leading to "wrong" data?)
[ ] check how submitters looked for data: did they say "Yes, data was totally easy to find", but did not tick all data characteristics

dannylammerhirt commented 7 years ago

Country ranking (not done this year, but parked for next year) Ranking of countries

[x] create a spreadsheet (here: https://docs.google.com/spreadsheets/d/1r1dq0NbVxhQ-MEhOTxUjLLWGsh1BjnILAXWxql_C2rI/edit#gid=1275606590)
[x] create a draft blogpost: https://docs.google.com/document/d/1bhWM37LZXQDSqcpaTQ2cYn3M5nSA7RCHoiYWzZQhoDM/edit#
[ ] Write an observation how ranking changes
[ ] Name all possible factors influencing ranking
- [ ] scoring
- [ ] datasets
- [ ] URLs we assess
[ ] discuss top outliers (especially those that lost places dramatically)
- [ ] Visualize the ranking changes with http://labs.polsys.net/tools/rankflow/
[ ] Write short post: research questions, analysis, findings (https://docs.google.com/document/d/1bhWM37LZXQDSqcpaTQ2cYn3M5nSA7RCHoiYWzZQhoDM/edit)

Findability Two approaches: checking how different users perceive findability; as well as URL analysis (see below)

How users perceive findability (some options how to assess this)

[ ] how many submitters who consider themselves topical or open data experts, say they found data easily (maybe visualising this with an alluvial chart).
[ ] check how many submitters did not find all characteristics, but still said they found the data easily (shows us whether submitters look thoroughly for data)
[ ] total count of submitters who found all data and said data was hard/easy to find (can still contain errors)
[ ]

URL analysis Question and assumption

[ ] Research question: Does our reference point change? Do people refer to different URLs over time?
[ ] Assumption: over time there should be a movement away from institutional website to data catalogue (promise of increased findability). Can we observe this?

Approach

[ ] create a comparative spreadsheet: https://docs.google.com/spreadsheets/d/12IGTyH7gb0G9NrZ4poa3RHswZ93KOtnRDqZFz_Z11IM/edit#gid=0
[ ] Retrieve all URLs where our submitters found data
[ ] remove countries that are not included in two or more year (check if you do two-year comparison or four year comparison)
[ ] only leave in URLs from data categories that are similar over time
[ ] shorten URLs with DMI Harvester tool to only return host names (Purpose: URLs to single pages on host may change. More interesting is whether we assess different hosts (e.g. institutional websites, or data catalogues?) Tool: https://wiki.digitalmethods.net/Dmi/ToolHarvester
[ ] Analyse how many data.gov. or data.gob we find
[ ] How many hosts are the same over time (color-code 1)
[ ] How many hosts change over time (color-code 2)
[ ] How many alternative places can we find online?

Interpretation & discussion

[ ] How to interpret results: Is variability of URLs a problem? Is consistency of URLs good because people have same reference point? Or does it mean that there is no advancement how data is published?
[ ] write findings in short text: https://docs.google.com/document/d/1WClAYNGXOAafQ-sRkG4L61HQmEbyNOJlL47T6zFNUts/edit
[ ] Caveat: we do not always collect URLs (in cases where we rejected submissions this year)
[ ] Get URLs for submissions we rejected (to do for @brew)

Submitter analysis (for @morchickit and @tlacoyodefrijol)

[ ] Analyse how we can scale our submitter base / increase engagement: How many unique submitters do we have? How many submitters are not part of the OK network or partners? How many new submitters submitted data for new countries?
[ ] Optional: timeline of submissions visualizing submission rates over time
[ ] How many open data experts do we attract? How many newcomers?

Comparison of findings between 2015 and 2016

See issue: https://waffle.io/okfn/opendatasurvey/cards/58de2fbc378dc331012bff20
[ ] User self-assessment: https://drive.google.com/drive/folders/0B5j55T4ZyssBQ09XQ1dWTGtibVE

morchickit commented 7 years ago

@tlacoyodefrijol we need this for next week! Deadline is Tuesday

dannylammerhirt commented 7 years ago

insert all ideas for future research to https://waffle.io/okfn/opendatasurvey/cards/58e60f0ff0f6b1a5001cf9b4

StephenAbbott commented 7 years ago

FinalreportTheStateofOpenGovernmentDatain2017.pdf

okfn / opendatasurvey

Write GODI report with main findings #904