trinker / termco

Regular Expression Counts of Terms and Substrings
Other
25 stars 5 forks source link

read_term_list function #51

Closed trinker closed 7 years ago

trinker commented 7 years ago

Function to read in external data or objects and handle formatting and warnings/errors.

This will make forming a term_list easier and can be used inside the term_count function (others as well?? token_count) to handle the term_lists checking.

trinker commented 7 years ago

Misplaced the last commit reference to issue https://github.com/trinker/termco/issues/10:

https://github.com/trinker/termco/commit/80a57852df546329d09803e7ae9e9a550e00be8d

trinker commented 7 years ago

summary function and a source_term_list to pull term list but not check (used to get individual regexes for testing.

trinker commented 7 years ago

Close (the check collapses and removes dupes. Maybe separate these 2 tasks??):

source_term_list <- function(file, indices = NULL, ...){

    ## ensure file exists
    stopifnot(file.exists(file))

    ## read in categories file
    cats <- source(file)[[1]]

    ## determine if hierarchical
    type <- ifelse(
        is.list(cats[[1]]) && length(cats) > 1 && all(sapply(cats, is.list)),
        'termco_nested',
        'termco_unnested'
    )

    switch(type,
        termco_nested = {

            ## check for empty tiers
            cats <- term_lister_empty_hierarchy_check(cats)

            ## grab the warnings for later printing
            first_pass <- lapply(cats, function(x) {
                tryCatch(term_lister_check(x, G = obj), warning=function(w) w)
            })

            locs <- unlist(lapply(first_pass, inherits, what = 'simpleWarning'))
            main <- 'The categories within the following hierarchies (tiers) contained no regular expressions and were dropped:\n\n'

            if (any(locs)) {

                tiers <- which(locs)

                messages <- paste(paste0(
                    'Tier ', tiers, ": ",
                     paste0('\n', gsub('^.+?\\n\\n|\\n\\n$', '', unlist(lapply(first_pass[locs], `[[`, 'message')))),
                     '\n'
                ), collapse = '\n')

                warning(paste0(main, messages))

            }

        }

    )

    if (!is.null(indices)) {
        warning('`indices` set: dropping `term_list` elements...\n\nUse `indices = NULL` to keep all elements')
        cats <- cats[indices]
    }

    return(cats)
}