Closed jwhendy closed 1 year ago
Hard to say and this doesn't immediately tweak my spidey sense about being connected to some known issue.
Have you looked at that specific line in the raw file or its general neighborhood to see if there's anything "interesting" about it?
@jennybc thanks for the skim, and I figured this would probably be a hard one to say much about. I wish I could just attach my file! I didn't see anything interesting. The file was too big to open directly in LibreOffice, so I had do the trick mentioned:
I read this in again, but used skip=1600000 and then saved it out. The first error is on row 47868, and when I look, everything seems as it should. a_string shown in the output of problems() is in Column G.
Could we start simple since I'm new to vroom? These are all just various ways of asking the same thing just to make sure I actually understand the error output correctly:
\t
or \n
in a cell or some goofy escape character)dat[problem_row, ]
, should I see the value it says is there in column 20?unique(dat[, problem_col])
, should I see the value it says in the vector?
I can't use my original data as it's confidential, but wanted to give a general idea as perhaps someone can clarify if I'm doing something wrong. If not, give me some time to try and reproduce via fake data.
At the moment... this is work related, I'm new to this package, and I'd like to understand if I'm missing something silly.
I was running a script which reads in a CSV, and saw an error:
From the docs, I see that
problem()
returns:In my script, I pass
col_types
, and so I recreated the read (currently withreadr::read_csv()
:Now I do
problems(dat)
and see the first row (data obscured):However, column 20 is indeed defined as
i
, and I find:I recognize the format of
a_string
, which would put it in column 7, not 20. This is a pretty massive file. Is this somehow about delimiters or e.g. newlines causing something to bump onto the next line?I read this in again, but used
skip=1600000
and then saved it out. The first error is on row 47868, and when I look, everything seems as it should.a_string
shown in the output ofproblems()
is in Column G.Let me know if something stands out I should check into further. It seems like a false alarm, but I wanted to try and investigate to make sure I wasn't missing something which would affect my results. Thanks!