tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

Add SPSS variable name validation #660

Closed gorcha closed 2 years ago

gorcha commented 2 years ago

This PR adds checks for SPSS variable name format and length, and does a case insensitive check for duplicate variable names (with thanks to @juansebastianl work in #643), closing #641.

gorcha commented 2 years ago

Related to the note about the variable name check, I think the reason I only checked the starting character of the variable name is because the SPSS manual has a vague reference to "nonpunctuation characters" being allowed after the first character:

Subsequent characters can be any combination of letters, numbers, nonpunctuation characters, and a period (.)

The PSPP manual is a bit more direct:

The remaining characters in the identifier must be letters, digits, or one of the following special characters: . _ $ # @

I'm going to tighten up the regex for valid variable names to cover the whole name unless there are any objections?

hadley commented 2 years ago

Sounds good to me. Feel free to merge once done; I don't need to review again.