w3c-webmob / installable-webapps

Use cases and requirements for installable web apps
43 stars 13 forks source link

Normalize mentions about the october data set #22

Open ernesto-jimenez opened 10 years ago

ernesto-jimenez commented 10 years ago

I think the best option would probably be to add an appendix about the october data set describing the situation and the caveats from the dataset (e.g: not using a mobile UA, so some websites might have done UA sniffing and served a desktop website without mobile specific meta tags)

Then we link to that appendix rather than using the current link to the gist we have right now.

marcoscaceres commented 10 years ago

Yes! absolutely.

We need to say if the sample is probabilistic or non-probabilistic (it's non-probabilistic because we don't know how many webpages there are on the Webs). Hence, we cannot generalize from it. However, the sample size n=78k, is more than appropriate for an exploratory analysis (cf. [1]).

Selection bias: the pages were selected by Alexia's ranking algorithm - hence we need to understand how they end up with this list... and if it's representative of "the world" (i.e., are all countries represented in the set, etc.). There may be language bias. We don't need to look at this, just acknowledge it.

We know some of the data may be bad if process with grep. I think that's about it. Or good enough to start.

[1] Reference: Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: John Wiley & Sons.