Dear all,

I would like to use qdap::syllable_sum to count the syllables in a large dataset (>450.000 rows). The text data whose syllables I want to count are stored in the column of a tibble. I want to count the syllables for each row and store the sum in a separate column.

Syllable_sum works fine with expected input, but throws an error message when it encounters unexpected input, e.g. "____" or "12345". Instead of storing an NA value and moving on, the process stops and prints the error message "Subset out of bounds".

Is there a way to force the function to spit out an NA for those rows where the function cannot produce a syllable sum?

Here's a reproducible example with some 'good' and 'bad' input. I would like the function to create a new column syls_y with the sums (3 and 4) for the first two rows, and NA for the last three.

Thank you in advance for your tips!

qdap syllable separation reprex

Load packages

library(tidyverse) library(qdap)

Create data frame

data <- tibble( x = 1:5, y = c("A few words", "a few more words", "____", "1235", "+#$") )

Syllable separation of each row in data$y

data_syls = data %>% mutate(syls_y = qdap::syllable_sum(data$y))`

trinker / qdap

syllable_sum error message: Subset out of bounds #262

qdap syllable separation reprex

Load packages

Create data frame

Syllable separation of each row in data$y