ropensci / ruODK

ruODK: An R Client for the ODK Central API
GNU General Public License v3.0
42 stars 13 forks source link

How to get labels and hints #77

Closed dmenne closed 3 years ago

dmenne commented 4 years ago

A snippet to get labels and hints into the form_schema. Must be improved when translations are present. "guidance" field may need a second look, and an XPATH expert could simplify the paths.

Feel free to use or not


  svc = "https:xxxxx",
  un = "yy",
  pw = "xx",
  tz = Sys.timezone()

f_xml = as_xml_document(form_xml())
ff_xml = tibble(
  # Remove trailing /data
  path= str_sub(xml_text(xml_find_all(f_xml, "//translation/text/@id")),6),
  label = xml_text(xml_find_all(f_xml, "//translation/text"))
  separate(path, sep=":", into=c("path", "type")) %>% 
  pivot_wider(names_from=type, values_from=label)

fs_extended = form_schema(flatten = FALSE) %>% 
  left_join(ff_xml, by="path")

Here an example from the xml-file

        <translation default="true()" lang="Deutsch (de)">
          <text id="/data/pat_gruppe:label">
          <text id="/data/pat_gruppe/pat_no_barcode:label">
florianm commented 4 years ago

Thanks Dieter! I'll add this to form_schema_parse (used up to odkc v7) and submit a feature request for the new form_schema (direct JSON from odkc) to include these.

dmenne commented 4 years ago

Better not yet. I will try more forms and update you. This was a first try.

matthew-white commented 4 years ago

Part of me thinks that it could be useful to add something like this to the Central API: there may be enough use cases where a user needs these labels that the best approach would be for Central to provide that information. Feel free to create a topic in the Features category of the ODK forum if something along those lines would be helpful!

florianm commented 4 years ago

@matthew-white it would be awesome to have labels and hints included in'-individual-form/getting-form-schema-fields, e.g. as nested list

    "name": "age",
    "path": "/age",
    "type": "int",
    "label": {"en": "Age", "de": "Alter"},
    "hint": {"en": "...", "de": "..."}

Crosslink: Feature request on the ODK Forum

mtyszler commented 3 years ago

I submitted an issue at Central (

mtyszler commented 3 years ago


This is a good solution:

ru_setup( svc = "https:xxxxx", un = "yy", pw = "xx", tz = Sys.timezone() )

f_xml = as_xml_document(form_xml()) ff_xml = tibble( path= str_sub(xml_text(xml_find_all(f_xml, "//translation/text/@id")),6), label = xml_text(xml_find_all(f_xml, "//translation/text")) )%>% separate(path, sep=":", into=c("path", "type")) %>% pivot_wider(names_from=type, values_from=label)

fs_extended = form_schema(flatten = FALSE) %>% left_join(ff_xml, by="path")

but please be aware that this works only when there are multiple translations. If there is a single language, labels are stored in a different way (there's no translation element in the XML tree).

I'll submit a suggestion soon.

mtyszler commented 3 years ago

I prepared this function:


# the function below uses the exact function signature as form_schema()
# in that sense, you could replace any call to form_schema by form_schema_ext
# it gets in addition to the form_schema columns, the common label, and the multilanguage labels if available
# it gets also the choice list and labels, in multilanguage if existing

form_schema_ext <-  function (flatten = FALSE, odata = FALSE, parse = TRUE, pid = get_default_pid(), 
                              fid = get_default_fid(), url = get_default_url(), un = get_default_un(), 
                              pw = get_default_pw(), odkc_version = get_default_odkc_version(), 
                              retries = get_retries(), verbose = get_ru_verbose()) 

  # gets basic schema
  frm_schema <-form_schema  (flatten, odata , parseE, pid , 
                             fid, url, un, 
                             pw , odkc_version, 
                             retries, verbose)

  # gets xml representation
  frm_xml <-  as_xml_document(form_xml (parse, pid, fid, 
                                        url, un, pw , 

  ### parse translations:
  all_translations <- xml_find_all(frm_xml, "//text")

  # initialize dataframe
  extension <- data.frame(path = character(0), label = character(0), 
                          stringsAsFactors = FALSE)

  ### PART 1: parse labels:
  raw_labels <- xml_find_all(frm_xml, "//label")

  # iterate thorugh labels
  for (i in 1:length(raw_labels)){

    ## path
    # gets ref from parent, without leading "/data"
    this_path <-  sub("/data", "",
                      xml_attr(xml_parent(raw_labels[i]), "ref"), 

    # ensure this is a valid path
    if (! {

      # adds new empty row:
      extension[nrow(extension)+1, ]<-rep(NA, ncol(extension))

      # adds path
      extension[nrow(extension), 'path'] <- this_path

      ## reads label
      this_rawlabel <-raw_labels[i]

      # first checks if it is multi-language label
      multi_lang <- xml_has_attr(this_rawlabel, "ref")

      if (multi_lang) {
        # if multi-language, finds all translations related to this path:
        id <- paste0("/data", this_path, ":label")
        translations <- all_translations[xml_attr(all_translations, "id") == id]

        # iterate through translations
        for (j in 1:length(translations)) {

          # first check this is a regular text labels. Questions in ODK can have video, image and audio "labels", 
          # which will be skipped. This is identified by the presence of the 'form' attribute:
          is_regular_label <- !xml_has_attr(xml_find_first(translations[j],"./value"), "form")

          if (is_regular_label) {
            # reads the parent node to identify language:
            translation_parent<- xml_parent(translations[j])
            this_lang <- gsub(" ", "_", tolower(xml_attr(translation_parent, "lang")))

            # decide if 'default' language or specific language
            if (this_lang == "default") {
              # if 'default' language, save under column 'label':
              extension[nrow(extension), 'label'] <- xml_text(xml_find_first(translations[j],"./value"))
            else {
              # check if language already exists in the datafram
              if (!(paste0("label_",this_lang) %in% colnames(extension))){

                # if not, create new column
                extension <- cbind(extension, data.frame(new_lang = rep(NA, nrow(extension))))
                colnames(extension)[ncol(extension)] <- paste0("label_",this_lang)

              # adds the first value content of the translation
              extension[nrow(extension), paste0("label_",this_lang)] <- xml_text(xml_find_first(translations[j],"./value"))



      else {
        # extract content
        extension[nrow(extension), 'label'] <- xml_text(this_rawlabel)


      ### PART 1.1: parse choice labels
      ## checks existence of  choice list:
      choice_items<-xml_find_all(xml_parent(this_rawlabel), "./item")
      if (length(choice_items)>0) {

        # check if 'choices' column already exist
        if (!('choices' %in% colnames(extension))){

          # if not, create new column
          extension <- cbind(extension, data.frame(choices = rep(NA, nrow(extension))))

        # initialize lists
        choice_values <- list()
        choice_labels <- list()

        # iterate through choice list:
        for (jj in 1:length(choice_items)) {

          this_choicevalue<-xml_text(xml_find_first(choice_items[jj], "./value"))

          # raw label
          this_rawchoicelabel <- xml_find_first(choice_items[jj], "./label")

          # first checks if it is multi-language choice label
          multi_lang_choice <- xml_has_attr(this_rawchoicelabel, "ref")

          if (multi_lang_choice) {
            id_choice <- paste0("/data", this_path,"/",this_choicevalue, ":label")
            choice_translations <- all_translations[xml_attr(all_translations, "id") == id_choice]

            # iterate through choice translations
            for (kk in 1:length(choice_translations)) {

              # first check this is a regular text labels. Questions in ODK can have video, image and audio "labels", 
              # which will be skipped. This is identified by the presence of the 'form' attribute:
              is_regular_choicelabel <- !xml_has_attr(xml_find_first(choice_translations[kk],"./value"), "form")

              if (is_regular_choicelabel) {
                # reads the parent node to identify language:
                choice_translation_parent<- xml_parent(choice_translations[kk])
                this_choicelang <- gsub(" ", "_", tolower(xml_attr(choice_translation_parent, "lang")))

                # decide if 'default' language or specific language
                if (this_choicelang == "default") {
                  # if 'default' language, save under 'choice':
                  choice_labels[['base']][jj] <- xml_text(xml_find_first(choice_translations[kk],"./value"))
                else {
                  # check if language already exists in the dataframe
                  if (!(paste0("choices_",this_choicelang) %in% colnames(extension))){

                    # if not, create new column
                    extension <- cbind(extension, data.frame(new_choicelang = rep(NA, nrow(extension))))
                    colnames(extension)[ncol(extension)] <- paste0("choices_",this_choicelang)

                  # adds the first value content of the translation
                  choice_labels[[paste0("choices_",this_choicelang)]][jj] <- xml_text(xml_find_first(choice_translations[kk],"./value"))


          else {

            choice_labels[['base']][jj]<- xml_text(this_rawchoicelabel)

        # add to the extended table:
        for (this_choicelang in names(choice_labels)) {
          these_choicelabels <- choice_labels[[this_choicelang]]

          if (this_choicelang == "base"){
            this_choicelang_colname <- "choices"
          else {
            this_choicelang_colname <-this_choicelang

          extension[nrow(extension), this_choicelang_colname] <- list(list(list(values = unlist(choice_values), 
                                                                                labels = unlist(these_choicelabels))))



  # join:
  fs_ext <- frm_schema %>% dplyr::left_join(extension, by = "path")


On top of the function from @dmenne , this provides also choice lists and handles multiple languages:

Here is an example output from a form with a multiple-language labels and single-language choice-list:

path name type ruodk_name label labelenglish(en) labelfrench(fr) choices
/some_text some_text string some_text NA This is a basic fill in the blank question. (FRENCH) This is a basic fill in the blank question. NA
/text_image_audio_video_test text_image_audio_video_test string text_image_audio_video_test NA This question shows how to use translations and media types. This question shows how to use translations and media types. NA
/a_integer a_integer int a_integer NA Enter a integer: Enter a integer: NA
/a_decimal a_decimal decimal a_decimal NA Enter a decimal: Enter a decimal: NA
/calculate calculate string calculate NA NA NA NULL
/calculate_test_output calculate_test_output string calculate_test_output NA The sum of the integer and decimal: The sum of the integer and decimal: NA
/test_yn test_yn string test_yn NA What do you think? Ça va? list (values = (0 , 1 , 99), labels = ("Yes" ,"No", "Maybe")
/meta meta structure meta NA NA NA NULL
/meta/instanceID instanceID string meta_instance_id NA NA NA NULL

I haven't stress tested it, but I'll try to turn it into a pull request once i have the time.

Feel free to test and comment.

florianm commented 3 years ago

Nice work! This would warrant a new test form. Unit tests could run against that form, and also against the current forms without translations.

dmenne commented 3 years ago

Great! I had already noted that the function failed sometimes, but did not have the time to test out why. You saved my days!