rehamaltamimi / wikipedia-map-reduce

Automatically exported from code.google.com/p/wikipedia-map-reduce
0 stars 0 forks source link

Accessing ALL users, not just those with activity #9

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
One of the main issues with user-based analysis of Wikipedia is that it's hard 
to capture all user information.  In our system, if users don't have IDs 
attached to the, we have to ignore them; furthermore, we can't get information 
on a registered user in the first place unless they edit a page.  We can't rely 
on User: namespace pages, because not every user's User: page is defined.

This issue is a first gathering place for thoughts on how best to handle this 
problem.  If some sort of consensus starts to be had, we'll move everything 
over to a Wiki page.

Original issue reported on code.google.com by colin.t....@gmail.com on 24 Jun 2010 at 7:23

GoogleCodeExporter commented 8 years ago

Original comment by colin.t....@gmail.com on 24 Jun 2010 at 7:24

GoogleCodeExporter commented 8 years ago
Are the wikipedia ids for the User: page the same as the user ids that appear 
in the edits of other pages?

Original comment by shi...@gmail.com on 30 Jun 2010 at 1:54

GoogleCodeExporter commented 8 years ago
No, wikipedia has two different ID sets, one for articles and one for users, 
and User: pages use the article ID set.  Any edits performed by the owner are 
done using the user ID set.

So User:Shilad has an article ID, but Shilad, as a user, has a user ID.

I'll look for an example next time I'm on my linux side.

Original comment by colin.t....@gmail.com on 30 Jun 2010 at 5:40