montera34 / pageonex

PageOneX. Analyzing front pages
http://pageonex.com
GNU Affero General Public License v3.0
53 stars 13 forks source link

Extra fields when classifying an area #206

Closed numeroteca closed 6 years ago

numeroteca commented 6 years ago

TLDR: This new feature would add the capability to add extra information to drawn areas. A highlighted area could have multiple taxonomies and open fields.

This feature was requested time ago (see thread in the emails list by Ariadna and #82).

A drawn area is now related to a unique topic. An area is defined by its dimensions, location in the front page, user that created it, newspaper, date and topic.

We want to associate it to other taxonomies like "frame:positive" (or "frame:negative" or "frame:neutral") or "corruption_type:administration".

Basic version

This version would have predefined taxonomies when enabling this "Extra fields" feature.

Update/create thread

When editing/creating a thread there would be a section "advanced features" that can be enabled. Once enabled (check box marked) the user would be able to classify the areas by these extra taxonomies while coding (drawing areas).

To start (and test) we can create a taxonomy called "frame". With the predefined options:

The taxonomy terms for these new taxonomies would be editable by the administrator of PageOneX in a special page, accessible through the menu under the Admin section.

Drawing areas

When editing an area it will be displayed in a secondary dropdown menu (see image): Extra field for pageonex coding You can select the category and then the frame.

Export

RAW (all areas dRAWn)

Add new way of exporting data in json (or csv, whichever is easier) that we will call "raw", includes all the highlighted areas. I think a flat object in csv would be easier to handle.

In the exported file all the information associated with an area should be included:

I am thinking in post-processing this file with R.

To complete all the information we need for the post-processing is a file with all the front pages that are not available. I am thinking how this should be. Working on it.

Display

For the moment this feature will no affect the display.

Enhanced version

In a future enhancement the user would be able to create such new fields/taxonomies on the fly when creating/editing a thread.

It will be possible to juxtapose the different taxonomies used in a single visualization (following this example http://numeroteca.org/2013/02/06/3-steps-to-measure-the-corruption-coverage-in-spain/ or this one http://numeroteca.org/2016/06/12/dieta-mediatica-en-precampana-26j/) and calculate.

Export

ODS

In the spreadsheet every tab belongs to one newspaper. Example for the data in tab for "The Guardian" newspaper, current status:

Date Topic:1 Topic:2 Topic:3
42826 0 0.12 0.44
42827 0.13 0.22 0.14
42828 0 0 0.24

The value is the percentage of area dedicated to a particular topic in one front page in one newspaper a particular day (date is written in numeric format).

After adding extra field we need to create extra columns:

Date Topic:1 Topic:2 Topic1:positive Topic1:negative Topic2:positive Topic2:negative
42826 0 0.12 0 0 0 0.12
42827 0.13 0.22 0 0.13 0 0.22
42828 0.3 0 0.1 0.2 0 0

Adding one extra taxonomy would make the spreadsheet very complicated, which makes me think that we will need an extra feature to export all the areas.

JSON

The main JSON structure that has all the data structured as follows: Date > Newspaper > Topic:Percentage

"2017-04-27":{
  "El País":{
     "PP":0.18,
     "PSOE":0.2},
  "El Mundo":{
      "PP":0.16,
      "PSOE":0.1,
   "Total":{
      "PP":0.17,
      "PSOE":0.15}
   },
"2017-04-28":{
  "El País":{
     "PP":0.1,
     "PSOE":0.1},
  "El Mundo":{
      "PP":0,
      "PSOE":0.1},
   "Total":{
      "PP":0.05,
      "PSOE":0.1}
   }

Work in progress.

Date > Newspaper > Area id > {Area:Percentage, Position:(x,y center coordinates in %),Topic:topic1, Frame:frame1}

"2017-04-27":{
  "El País":{
     "id2839842304":{
       area:0.18,
       position:(0.34,0.55),
       topic:PP,
       ...
       taxonomies:(neutral/negative,has_foto,no_animals)}
  }
numeroteca commented 6 years ago

It works. It's wonderful!

Two questions for @xuanxu (maybe I should open new issues):

It's crucial to be able to calculate the percentage of area in available newspapers in one frotn page and in any given particular.

The result would be something similar to this:

{"areas":
 [
  {"areas_id":106,"user_name":"numeroteca", ... ,"area_height":88},   
  {"areas_id":107,"user_name":"numeroteca",...,"area_height":190}
 ],
"no_image":
 [
  {"publication_date":"2016-10-04","media_name":"Clarín","media_country":"Argentina"},   
  {"publication_date":"2016-10-06","media_name":"Crónica","media_country":"Argentina"}
 ],
"image_size":
 [
  {"media_name":"Clarín","media_country":"Argentina","width":750;"heigh":1041},   
  {"media_name":"Clónica","media_country":"Argentina","width":750;"heigh":1041}
 ]
}

We should assume all the days have the same size for a particular media.

xuanxu commented 6 years ago

@numeroteca:

numeroteca commented 6 years ago

As @xuanxu suggests it would be better to use the following structure for image information:

"images":
  [    {"publication_date":"2018-07-02","media_name":"Clarín","media_country":"Argentina","image_size":"750x1041","missing":false},
    {"publication_date":"2018-07-02","media_name":"Le Figaro","media_country":"France","image_size":"750x1103","missing":false},
    {"publication_date":"2018-07-03","media_name":"Clarín","media_country":"Argentina","image_size":"750x1041","missing":true},
    {"publication_date":"2018-07-03","media_name":"Le Figaro","media_country":"France","image_size":"750x1103","missing":false},
    {"publication_date":"2018-07-04","media_name":"Clarín","media_country":"Argentina","image_size":"750x1038","missing":false},
    {"publication_date":"2018-07-04","media_name":"Le Figaro","media_country":"France","image_size":"750x1114","missing":false}
  ]

Adding the url to the original image. That way all the info of the thread is available to be reconstructed in other place.

numeroteca commented 6 years ago

Everything works great, @xuanxu. One last related request:

As planned, for now, only administrators can create taxonomies, taxonomy options and use them in threads. Once we go to production, to make the new feature available for other users, we would need to create an extra field for every user so that the admin can grant them with access to select taxonomies in their own thread. Similar to the checkbox we have for admins:

Image of user editing in PageOneX

we would need something like "Taxonmist" that would enable these users to to use select taxonomies and use them in threads (select which taxonomies to use). For the moment, as a basic feature, only the admin would be able to create taxonomies and taxonomies options.

Would this be easy to implement?

xuanxu commented 6 years ago

@numeroteca I'm not sure I understand, right now any user can add taxonomies to their threads. Admins are the only ones allowed to created taxonomy/options, but once created a taxonomy is available for every user.

To make use of the taxonomies they have to activate them in the create/edit form:

captura de pantalla 2018-07-16 a las 13 51 12

numeroteca commented 6 years ago

Ups, you are right! So, what I was asking for is not needed. The feature is already available for all :-D

What I came across now is: when forking, active taxonomies and taxonomy data associated to areas needs to be copied as well to the forked thread. I think this will be the last thing before closing this issue and PR.

xuanxu commented 6 years ago

@numeroteca done! PR ready.