pbs-assess / rosettafish

:fishing_pole_and_fish: An R package for translating fish- and fisheries-related terms
3 stars 15 forks source link

Option to use custom terms file #18

Closed SOLV-Code closed 4 years ago

SOLV-Code commented 4 years ago

Thank you for a great package.

One thing I'd like to see is the option to feed a custom terms file into the trans() function, to quickly deal with project-specific terms, or build language extensions, or just set up preliminary plots with rough translations.

Do you have any plans to include this? If not, and I adapt the trans() function, how should I cite it? Maybe something like "function adapted from rosettafish package (Anderson et al. 2019)" ?

Cheers, Gottfried Pestal

seananderson commented 4 years ago

That seems like a reasonable addition. We really want to encourage people to submit their translations to the .csv file since the more complete that file is the more useful it is for everybody; however, I can see the utility in the use cases you suggested.

How do you envision the interface looking? We tend to use the en2fr() (or fr2en()) functions. Maybe an optional argument to trans() named temporary_terms or custom_terms that takes a data frame with the columns english and french? Internally trans would bind that to the end of rosetta_terms and remove any duplicates. The en2fr and fr2en functions could pass that on via the ....

For citation, citation("rosettafish")... except that I need to populate the DESCRIPTION file for that to be useful. Will do.

SOLV-Code commented 4 years ago

Thank you. An argument to trans() makes the most sense for the way I'm using it. If you are appending to rosetta_terms, would you build in a check for duplicate terms with alternative translations? For my purposes a simple either/or set-up would work: use built-in if custom_terms is NULL or use custom_terms if feeding in a data frame.

Below is what I'm using for now, but I'll switch over to the package version when the feature is included. Not urgent , though, because what I've got now works fine. Also below is the table with preliminary translations I'm feeding in. This way I can set everything up, and then just revise the lookup table when the proper translations come back from the CSAS office. You can see that some of these are so case-specific, they really don't belong in a generic master dictionary...

x.lab.use <-   paste0(translate("Productivity Change", terms = terms.use, from = "english", to = lang.use),
                      " (% ",
                      translate("of", terms = terms.use, from = "english", to = lang.use),
                      " ",
                      translate("Median Alpha", terms = terms.use, from = "english", to = lang.use), 
                      ")")
translate <- function(x ,terms, from = "french", to = "english", sep = "; ", allow_missing = FALSE) {

# adapted from function trans() in the rosettafish package
# https://github.com/pbs-assess/rosettafish
# this version uses a local custom file of terms, rather than the built in vocab from the package

  from.vec <- terms[, from, drop = TRUE]
  to.df <- terms[, to, drop = FALSE]

  j <- match(x, from.vec)

  if (!allow_missing) {
    if (any(is.na(j))) {
      if (sum(is.na(j)) == 1L)
        stop("The following term is not in the translation database: ", x[is.na(j)],
          call. = FALSE)
      else
        stop("The following terms are not in the translation database: ",
          paste(x[is.na(j)], collapse = ", "), call. = FALSE)
    }
  }
  v <- to.df[j,]
  v[is.na(j)] <- x[is.na(j)]
  if(class(v) == "data.frame"){
    v <- as.character(apply(v, 1, function(x){ paste0(x, collapse = sep)}))
  }
  v
}
english french
Exploitation Rate Taux d'Exploitation
Year Année
Productivity Change Changement de Productivité
of de
Median Alpha Alpha Médian
Recovery Goal But de Récupération
No Big Bar Pas de Big Bar
1 year blocked passage 1 an de passage bloqué
2 years blocked passage 2 ans de passage bloqué
3 years blocked passage 3 ans de passage bloqué
4 years blocked passage 4 ans de passage bloqué
Fishway Passe à Poisson
seananderson commented 4 years ago

This commit added a custom_terms argument. For example:

library(rosettafish)
df <- data.frame(english = c("aaa"), french = c("bbb"))
en2fr("aaa", custom_terms = df)
# [1] "bbb"
SOLV-Code commented 4 years ago

Awesome, thanks!