mozilla / overscripted

Repository for the Mozilla Overscripted Data Mining Challenge
Mozilla Public License 2.0
74 stars 53 forks source link

What's in the really large values? #22

Open birdsarah opened 5 years ago

birdsarah commented 5 years ago

The value column contains some really large items, what's in there?

An initial look by @dzeber found that there were some csv files containing football scores, but a systematic review hasn't been done.

Do any indicate a potential for a privacy / information loss?

AlbionaHoti commented 5 years ago

Hi, @birdsarah I am an outreachy applicant, can I work on this?

Do you have any suggestion as a starting point for me?

birdsarah commented 5 years ago

Hi @AlbionaHoti no need to ask. I look forward to your contribution. This is a very open-ended question so what you make of it will be entirely your own.

This notebook is a notebook that starts the conversation about the value column: https://github.com/mozilla/overscripted/blob/master/analyses/hello_mozfest.ipynb

With that said that notebook is not a template. There are many things to explore in this column. Overall, I'm interested in the broad question: what's in the value column? And in very initial examinations I noticed that there are some very large values. This piqued my interest but I didn't have a chance to dig into it more.

Please note that to work on this question you will need to have to make sure that you have a dataset that has the "value" column and not just the "value_1000" column.

For a super open ended question like this, I would encourage you to be iterative: look at the dataset, write-up some findings, observations, notes, and then post a PR with your notebook before the deadline. We can then review and maybe riff off some new questions based on your findings.

Hope this all makes sense.

noahwalugembe commented 5 years ago

I would like also to work on this.

birdsarah commented 5 years ago

@noahwalugembe. There are no assigned issues. Everyone is free to work on any research question. There are many possible analyses for a given research question. You are welcome to work on this question.

I encourage you to read through all the available information including all the history of the chat from March 11 to today (https://gitter.im/overscripted-discuss/community) to get the most possible information to start any analysis work.

noahwalugembe commented 5 years ago

I would like also to work on this.

Thanks am now working on it