status-im / ETHPrize-interviews

A repository for the ETHPrize website.
https://ethprize.io
35 stars 13 forks source link

Data analysis of current bounties #14

Closed andytudhope closed 6 years ago

andytudhope commented 6 years ago

We need someone with a background in NLP and data analysis to help us parse the massive amount of data we have gathered in the course of these interviews and produce all of the information-at-a-glance that will live on the home page of our Interview Report (for which there is a design bounty currently out).

We want to see word clouds, graphs, and any other visual tool that will make it easy for people to grasp the most salient points without needing to go through in detail every single interview.

cc @jpbowen for some further potential requirements/ideas about how and what we should be looking to pull from the report.

Much like the design bounty - if you are keen to do this, please let us know in a comment below with a little blurb about who you are and what qualifies you to do this and we will lock the bounty for the person we feel is capable of doing it to a world-class standard.

Both nvivo and deedose were designed for exactly this type of task. Although both are proprietary with trial periods if someone wants to crank out an analysis in a few days. For a more secure workspace, nvivo is probably the way to go since you download a local copy. Beyond that the true open source is to get your hands dirty with R/Rstudio and text analysis package like RQDA.

CS76 commented 6 years ago

Is this the complete interviews data ? @andytudhope https://github.com/status-im/ETHPrize/tree/master/bounties_report/interviews

GitHub
status-im/ETHPrize
A repository for the ETHPrize website.
mkosowsk commented 6 years ago

@CS76 we are still processing a few of the newest interviews, but this is almost all the way up-to-date :) is this correct @andytudhope? 🤔

andytudhope commented 6 years ago

Yeah. And I have found a data scientist in France to help handle this, thanks. It would be awesome to have some help with building the actual report for web and stuff though @CS76

CS76 commented 6 years ago

Wonderful @andytudhope ! Would be great if we can have something link a Kaggle kernel (https://www.kaggle.com/kernels) so the community could extend and make necessary changes :) Thanks

andytudhope commented 6 years ago

Ask and you shall receive @CS76 https://www.kaggle.com/cryptowanderer/ethprize-developer-interviews

ETHPrize Developer Interviews
Community research project - developer tooling & infrastructure for Ethereum
andytudhope commented 6 years ago

@CS76 https://www.kaggle.com/mratsim/state-of-the-art-nlp-on-ethereum-dev-interviews

State of the art NLP on Ethereum dev interviews | Kaggle
mratsim commented 6 years ago

Wow so fast, I didn't even have time to catch my breath!

CS76 commented 6 years ago

Awesome! community-driven open source innovation at its best

mratsim commented 6 years ago

I've updated the kernel - https://www.kaggle.com/mratsim/state-of-the-art-nlp-on-ethereum-dev-interviews

Some thoughts on the automated processing:

In any case, I think like currently automated processing is not good enough due to some super niche questions (Ethereum dev tools) and I will instead do more manual tagging, maybe separate in an "automated" and a manual analysis.

State of the art NLP on Ethereum dev interviews | Kaggle
mratsim commented 6 years ago

For the final report, due to the complexity and accuracy issue of automated approaches, what is presented on the website was a classification/tagging by topics and projects made by @andytudhope : ETHPrize report.

However as data to analyze grows, it's good to have a set of NLP approaches that we can refine.

I've added some more approaches in status-im/datasets.