Open GoogleCodeExporter opened 8 years ago
User information by namespace:
* Default - none from <article> tags, name/id from <revision>'s <contributor> tag.
* Talk - none from <article> tags, name/id from <revision>'s <contributor> tag.
* User - owner's name from <article> tags, owner's name/id from <revision>'s <contributor> tag ONLY IF the user has edited their User: page.
* User talk - owner's name from <article> tags, owner's name/id from <revision>'s <contributor> tag ONLY IF the user has edited their User talk: page, other's name/id from <revision>'s <contributor> tag.
To summarize, we're guaranteed complete information only about editors, not
owners of User: and User talk: pages. We can only get complete information on
those users if they also edited those pages. Furthermore, the limited
information we have on those that do not edit their own pages is not sufficient
for identification, since names can be changed by users.
Original comment by colin.t....@gmail.com
on 24 Jun 2010 at 7:16
One possible solution is to ignore all users without complete information, that
is, without BOTH a name and an ID. Another option is to just ignore those with
only names.
Either way, this causes user-based analysis to be less reliable, since not all
users who have activity in the dataset will be in the resulting graph.
I'm making the decision to ignore users without IDs, and we can change it back
later if it comes to it.
Original comment by colin.t....@gmail.com
on 24 Jun 2010 at 7:20
Original issue reported on code.google.com by
colin.t....@gmail.com
on 4 Jun 2010 at 11:08