vitosamson / modlogs

A service for allowing subreddits to publish their moderator logs
https://modlogs.fyi
14 stars 3 forks source link

Requesting access to databse #17

Open kakeith opened 5 years ago

kakeith commented 5 years ago

Hi Vito,

I am a graduate research student in computer science (natural language processing) at University of Massachusetts Amherst. I am hoping to use the Reddit public moderation logs in future study of "Causal Effects of Online Moderation." Just a few questions:

(1) Just want to double-check to make sure the moderation logs are indeed public and able to be used for research purposes (not violating any terms of service or consent of the users or moderators).

(2) I could scrape the webpages from directly from https://modlogs.fyi. However, it would be much cleaner if I was able to access the database of modlogs directly. Any way you can help me out with this?

Please email me: kkeith [at] cs [dot] umass [dot] edu

Thank you for your time! Katie

vitosamson commented 5 years ago

Hey Katie,

The logs are indeed public, however the information displayed on modlogs.fyi is not necessarily the full data contained in the database. Mods are able to provide a modlogs config that controls what gets displayed on the site, such as which mod executed the action, the user that received the action, a link back to the original reddit content, etc.

Can you give me an idea of which data points you need for your research? I don't really have any terms of service, and I'm not sure if I should assume implicit content through use of the service. For your reference, here is the documentation on the config file that controls what's displayed on the site: https://github.com/vitosamson/modlogs/blob/master/ModeratorInstructions.md#configuration.

Thanks for reaching out. I'm certainly interested in your research, as the two subreddits I moderate (r/NeutralPolitics and r/NeutralNews) employ heavy moderation, and I'm happy to help where I can.

Vito

kakeith commented 5 years ago

Hi Vito!

Thanks for getting back to me so quickly! I did notice that only some (I think ~4) of the subreddits contain data about which mod executed the action, but the rest do not. For my research, there are really important per-moderator effects that we would have to take into account in a causal model.

I totally understand if the mods do not want to make this information public, but do you think they (or specifically you for your subreddits r/NeutralPolitics and r/NeutralNews) would be willing to share with me mod logs that had moderator ids? These could be completely anonymized (on your end or mine, e.g. "mod1" instead of the actual moderator id), and I would agree to not release any of the data publicly without explicit consent.

I'm also in the process of figuring out whether or not I need an IRB approval for this line of research, or if having an IRB approval would make moderators more comfortable sharing their data with me. Please let me know if this is any way possible. I do expect our findings could be extremely valuable to the moderator community at-large.

Best, Katie

vitosamson commented 5 years ago

Let me run this by the other mods in my subreddits and see what they think. I'll get back to you in a little bit.

kakeith commented 5 years ago

Ok! Thank you!

vitosamson commented 5 years ago

Hey Katie, sorry for the wait, school has been keeping me really busy.

I can provide you with the full logs, but with usernames randomized. I'll need just a little more time to put together a script to accomplish that. I use mongodb for the database, so if it's alright with you I'll provide the data in mongodump format. You should be able to load that up into your own mongodb database and then from there put it into whatever format you need.

The only requirement I have is that the data I provide to you only be used specifically for the stated purpose of this research project. It cannot be shared or distributed to others for any other purpose.

Does all that sound ok?

kakeith commented 5 years ago

Great! Thank you! Yes, the mongodump format will be fine.

Yes, your consent and privacy is one of my upmost concerns. So I will guarantee: (1) There will be no public or private sharing of your moderation logs. (2) I will not mention any specific moderator or user names in published papers or presentations. The results will be in the aggregate.

Again, my email is kkeith@cs.umass.edu if that is the easiest way to share.