Open GoogleCodeExporter opened 9 years ago
Upon further investigation, using xdebug to trace, it looks like the real
problem
comes from _check_count()'s comparison of the passed $depth (number of lines
that
were supposed to be checked) to the length of $array (number of lines which
$char
appeared in). Commenting out the logic results in the expected results, albeit
with
blank lines in the output table.
I remember reading that this logic is supposed to be a nearly foolproof method
of
determining delimiter, but it is short-circuited by the the parser's ignorance
of
blank lines. Obviously, the quick fix is to rewrite the auto() routine to
decrement
$depth if it walks a blank line; however, it may be more beneficial to the
project it
implement logic to deal with blank lines on a global scope. RFC4180
(http://tools.ietf.org/html/rfc4180) has no particular stance on how to deal
with
blank lines and having an option to skip them is common in many other CSV
packages.
Because I both do not know how active this project is and am interested in
finding a
solution myself, I'll post whatever changes I work out that produce output to
my liking.
Original comment by teh.a...@gmail.com
on 29 Jun 2008 at 7:36
I have to admit, that blank lines was something I never expected the parser to
come across... lol
Thanks for pointing this issue out. I'll obviosly implement the blank line fix
on a global scale. I've always kept
the main parser and the code in the auto() method handling input identically.
Basically, my plan is to have the parser treat single or multiple linefeeds the
same way, initiating a new row
parsing when it stumbles onto the first non-linefeed/whitespace character. I
should have time to look into
this tonight, and post some more details of my fix. At the moment I'm just
going from memory as I'm away
from home on my iPhone :P
Thanks again for pointing this issue out, and I'd be glad to see any solutions
you come up with :)
Also, you seem to know very well what you're talking about, so I'd be happy to
hear what you think in
general about parseCSV and if you have any suggestions or such :)
Original comment by zynode
on 29 Jun 2008 at 2:40
I haven't delved into the main parser since blank lines are easier to work
around
when looping through the data, so I stopped after getting the auto() function
to work
properly. If the methodology between parse_string() and auto() are identical as
you
say, a fix for parse_string() may be likewise identical. These are my changes:
-295| } elseif ( ($ch == "\n" && $pch != "\r" || $ch = "\r") && !$enclosed ) {
+295| } elseif ((/*win*/$pch.$ch == "\r\n" || /*nix*/$ch == "\n" || /*mac*/($ch
==
"\r" && $nch != "\n")) && !$enclosed) {
+296| $nnch = isset($data{$i+2}) ? $data{$i+2} : false;
+297| if (($nch == "\r" && $nnch == "\n") || $nch == "\n" || ($nch == "\r" &&
$nnch
!= "\n")) $n--;
The logic behind this is to properly detect the three employed flavors of line
breaks
and then to determine if another is to follow. Since it reads ahead an
additional
character beyond $nch, it should even handle mixed linefeeds (eg, \r\n\n\r\r\n).
Sounds like you have a good handle on how to approach blank lines, though you
may
want to consider having the facility to *not* ignore blank lines; wthey could,
perceivably, have some sort of significance. For example, in the CSV that
triggered
this bug for me, a blank line separates unique values in the first column.
While I
have no need to take advantage of this, it would be prudent to include a
$skip_blanks
option, or something of that sort.
Original comment by teh.a...@gmail.com
on 30 Jun 2008 at 7:20
I tried to make the modifications suggested above but it didn't work. I did not
fully understand the changes. Thus if a new version of the script could be
uploaded or at least have the partial code to change, it would be great.
Thank you in advance.
Original comment by romlep...@gmail.com
on 15 Jul 2013 at 2:56
Obviously, it's been quite some time since I made my original post (and it
seems like the author never followed up). I no longer have this code (I'm
fairly certain, anyway), so I can't help much, but I would recommend reviewing
the functions I mentioned in comment #1. It looks like the diff I provided in
comment #3 is the realization of the logic I described.
Original comment by teh.a...@gmail.com
on 15 Jul 2013 at 3:02
Original issue reported on code.google.com by
teh.a...@gmail.com
on 29 Jun 2008 at 7:14