jwzimmer-zz / tv-tropening

1 stars 0 forks source link

ethics/ transparency audit #3

Open jwzimmer-zz opened 3 years ago

jwzimmer-zz commented 3 years ago

Alpha version of checklist at: https://www.overleaf.com/read/vrqgnmmysrbc

jwzimmer-zz commented 3 years ago

Rough list of potential items for checklist

  1. Ethics as part of a project's preliminary Needs Assessment
    • Identify tools needed - any issues with procuring tools, or resources or knowledge required for tools?
    • Identify data needed - any issues with procuring data, or resources or knowledge related to data?
    • Any issue with publishing data? Should it be anonymized? How easy would it be to de-anonymize?
  2. Since this project passively analyzes data that already exists (no gathering new data or participants), what is the analogue of informed consent for the people who created the data?
  3. How is the outcome of this project going to be disseminated? Who will see it? Who can see it? Who should see it?
  4. What is the potential impact of this project?
  5. What are the potential harms of this project?
  6. What will happen to the data and other work-product generated during this project, once the project is over?
  7. For transparency and usability: create a "birth certificate" summary of project
  8. Recurring audit for transparency, ethics, etc.
  9. Make changes in response to audit
  10. Donate money to (1) a social justice cause and (2) an environmental cause
jwzimmer-zz commented 3 years ago

First pass

  1. Ethics as part of a project's preliminary Needs Assessment
    • Identify tools needed - any issues with procuring tools, or resources or knowledge required for tools?
    • Identify data needed - any issues with procuring data, or resources or knowledge related to data?
    • Any issue with publishing data? Should it be anonymized? How easy would it be to de-anonymize?

I think since all of the tvtropes site content I have access to is already public, and there is no plan to focus on individual users in any way, there aren't any new risks to those users. I think the tools needed will be mostly open-source, semi-open-source, or provided by UVM.

  1. Since this project passively analyzes data that already exists (no gathering new data or participants), what is the analogue of informed consent for the people who created the data?

I think their participation in the site is sufficient consent.

  1. How is the outcome of this project going to be disseminated? Who will see it? Who can see it? Who should see it?

I don't know.

  1. What is the potential impact of this project?

I don't know.

  1. What are the potential harms of this project?

Could cause burden on the tvtropes site by scraping; hopefully that is completely mitigated by the rate limit on the wget process. We could reinforce narratives and tropes by discussing them. We could paint the participants of the site reductively.

  1. What will happen to the data and other work-product generated during this project, once the project is over?

I think it will stay indefinitely on github. I don't think that introduces any additional risk.

jwzimmer-zz commented 3 years ago

Going over the questions in the Datasheets for Datasets paper and answering some of them in the interest of transparency... these aren't the most thorough and careful answers ever, but I'd rather have something than nothing as far as describing the repo in an organized fashion. I think reading over this at least gives you an idea of what this is.

Questions below are from https://arxiv.org/abs/1803.09010:

Subjects: | Databases (cs.DB); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: | arXiv:1803.09010 [cs.DB]
  | (or arXiv:1803.09010v7 [cs.DB] for this version)

Re https://github.com/jwzimmer/tv-tropening & https://github.com/jwzimmer/tv-tropes

Section 3.1 Motivations

Section 3.2 Composition

Section 3.3 Collection process

Section 3.4 Preprocessing/cleaning/labeling

Section 3.5 Uses

Section 3.6 Distribution

Section 3.7 Maintenance

jwzimmer-zz commented 3 years ago

This is still relevant and should be re-visited. What we wrote above was mainly thinking of the tv tropes data, not the character space data, so we should re-visit this issue in that context.