viswajithiii / gendermeme

0 stars 0 forks source link

Data Labelling #3

Open viswajithiii opened 7 years ago

viswajithiii commented 7 years ago

As a first pass, we will do the following labelling.

Bookkeeping spreadsheet: here. Make sure to add to this every time you create a new article, creating a new id for it and putting in the link.

For each article, create a text file with name the article_id.txt with the following format:

Line 1: article_id (a001, a002 ...) Line 2: URL Line 3: Headline Line 4: Byline (If multiple authors, separate by semicolon ("Poorna Kumar; Viswajith Venugopal")) Line 5 onwards: Body

(For TSVs, article_id can be a001p, a002p, ... for Poorna's and a001v, a002v, ... for Viswa's.)

Now, the annotation is in a text file with name article_id.tsv, and is of the following format (one line per person mentioned):

Full Name, Gender, Number of times mentioned (only by part or full name, NOT by pronoun), Says something (yes/no), Number of words quoted, Source/subject (src/sub), Adjectives (a comma separated list), Expert/non-expert, Profession/Role(s)

UPDATE (Week of March 13th): As per Maneesh's instructions, I'm also adding a 'Quotes' column at the end of new articles I annotate. These contain the raw quotes that the person says. Different quotes that a person says are delimited by the special token ''.

poorna-kumar commented 7 years ago

List of small problems, to be looked at by Viswa if possible: