Closed christopher-beckham closed 10 years ago
Thanks for the PR! I'll review this in the next day or two and get back to you. Cheers.
Thanks. Whoops, just realized my second commit is also showing up in this pull request, didn't think that would happen. You can choose to ignore that if you want, I added a method to test for duplicate column names, which I assume hasn't been written yet.
@chrispy645 No worries. FYI, if you keep adding commits before a PR is merged, it will be added to the PR.
Hi @chrispy645 Thanks for this PR. I'll accept the first edit (makes sense) but I'll skip the second one since R will automatically deal with duplicate names (or at least much of that functionality already exists in base).
Example: This csv:
x,x,x
1,2,3
1,2,3
1,2,3
when read becomes:
read.csv('test_data.csv')
> data
x x.1 x.2
1 1 2 3
2 1 2 3
3 1 2 3
Cool, thanks! Yeah, although it is true R will deal with duplicate column names when reading in a CSV, it's actually possible to change the column names after it's been loaded, e.g:
colnames(data)[1] = "x"
colnames(data)[2] = "x"
And it turns out, you can actually then save this to an .RData file, load it back in, and the duplicate column names will remain (at least when I do this through RStudio, anyway). Hence why I added the test. So do you still think it would be useful, or does base check this somehow?
(I think this very issue was in the second assignment of Jeff's data analysis course, which is where I got the idea from.)
Cheers
it's actually possible to change the column names after it's been loaded, e.g:
Hi @chrispy645, yes I'm aware. The rationale behind this package is that researchers load data from spreadsheets (xls
) or csv
that don't necessarily check for these issues. Once read into R, a user will fix them and then store the data back as flat files or as temporary Rdata
files. Data never begin this way. So we're hoping to help researchers quickly identify these issues as the data are read in from the raw format.
Someone introducing problems back into data before storing as intermediate .Rdata
files is an odd use case. So let me mull over it and if it turns out to still be useful, I'll retrieve it from your earlier commit.
Cheers
No worries. It is indeed an odd use case, so I understand. :)
Hi there,
First Git pull request - hopefully I didn't mess this up. I noticed that
test_utf8
failed on some data frames, giving this error:The error is pretty intuitive, although I still couldn't find what was causing this. All the column names are strings so I've no idea what's happening there...
In my fork of the repo I added a file in
inst/extdata
calledminimal.csv
, if you load this CSV into R and runtest_utf8
on it you will get the error. There was also another dataset I used, which worked fine, but when I reduced its dimensions to be the same asminimal.csv
, I got the same error.Despite not quite finding what the problem was, I decided to re-write one of the lines in
test_utf8.R
. Also no warnings need to be suppressed for this new line. :)Cheers