montera34 / pageonex

PageOneX. Analyzing front pages
http://pageonex.com
GNU Affero General Public License v3.0
54 stars 13 forks source link

When using multiple taxonomies, the order in the json changes #238

Open numeroteca opened 2 years ago

numeroteca commented 2 years ago

When using multiple taxonomies, the order in the json changes. This might not be an issue if properly handling json file, but when the json is flattened as I do in R, the order matters.

For example:

"taxonomy_values":{
"Corruption_cases":"Other",
"Frame attack-defense":"Attack"
}

And:

"taxonomy_values":{
"Frame attack-defense":"Attack",
"Corruption_cases":"Gürtel-Valencia"
}

So, before processing them in R we have to be sure they are in the same order.

Load data:

dataorigen <- fromJSON(file="raw-areas.json")
areas <- dataorigen$areas
areas <- lapply(areas, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)
})
# Important: this line flattens the json. and mizes taxonomies
pox <- as.data.frame(do.call("rbind", areas))

So we need to see which is the order of the first row and use it in all the others.

numeroteca commented 2 years ago

This is a solution

# read data
dataorigen <- fromJSON(file="raw-areas.json")

# Example with one element
# Converts first area element to dataframe
data.frame(areas[[1]])
# creates the order of columns of first area
col_order <- names( data.frame(areas[[1]]) )
# inserts new order of column to row 5
data.frame(areas[[5]]) %>% select(col_order)

# iterates through all the areas
for( i in 1:length(areas)) {
  if( i == 1 ) {
    # inserts the first area
    # first converts list element to dataframe and order its columns according to the first one
    pox <- data.frame(areas[[i]]) %>% select(col_order)
  } else {
    areas_tmp <- data.frame(areas[[i]]) %>% select(col_order)
    pox <- pox %>% add_row(areas_tmp)
  }
}

This issue should be duplicated in pageonexR repository.

numeroteca commented 2 years ago

It is pending forcing the json to have the same order in all the raw area items, to prevent this problem from happening.