jacob-long / panelr

Regression models and utilities for repeated measures and panel data
Other
99 stars 21 forks source link

What do do if `label_location` is in the middle of some variable names #26

Closed atanasj closed 1 year ago

atanasj commented 4 years ago

I have a wide data set and there time varying labels within the psychometric measures are in the form: QuestionnaireTime_Item#. An example is dass1_1 where dass = Questionnaire, 1_ = Time_ questionnaire was administered; and 1 = Item# of the relevant questionnaire.

Additionally, I have variables denoting the date and session number of administration i.e., date1 and session1. The labels for these variables are at the ends of the variables.

Time is denoted by numerals in most cases (1--15) and the word 'final' (e.g., datefinal, sessionfinal, dassfinal_1) for the last session.

Participants have repeated measures, however they number of data points varies by participant, and some measures are not repeated.

Is it possible to use your package to convert form wide to long without manually changing the variable names so that the time label is consistent and either at the beginning or the end of the variable name?

Please let me know if you need any more info. Thanks in advance for your help.

jacob-long commented 1 year ago

Sorry for the very long delay on this, but I think at least some of this should be possible using the match argument to the function, which can be written as a regular expression if needed if use.regex = TRUE. Whatever you match with the match argument, it will look for the wave before/after that based on what you give label_location.

In your specific case, it will be a bit of a complex regex (at least for my capabilities for such things) since you will need to look for alphabetical characters that precede a number(s), but you cannot match the digit. And additionally, if you do not want to change final to a number, you will have to look for but not match "final." I do believe this is possible, though, and would just involve setting up a lookbehind/lookahead.

Besides the difficulty of setting up the match argument, you can tell long_panel() about the slightly unorthodox "numbering" by providing the wave names yourself to the periods argument, e.g. periods = c(1:15, "final").

I suspect your need for this response has long since passed, but I am sharing so at least the next person with this probably-common issue may see it.