mageddo / javascript-csv

Automatically exported from code.google.com/p/jquery-csv
MIT License
1 stars 1 forks source link

Mac end of lines #27

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Make a CSV file
2. Save it using Mac formatted ends of lines.
3. Load it up.

What is the expected output? What do you see instead?
I expect it to work. Everything works great when using windows or linux 
formatted CSV files. When using Mac files, I get CSVDataError: Illegal state 
[Row:1]. 

What version of the product are you using? On what operating system?
This library has always acted this way. I hadn't gotten around to filing this 
particular bug report yet. But I would like it fixed if possible, so that I 
don't have to make my users worry about end of line formatting. So even in 
0.71, this happens. 

Please provide any additional information below.

Maybe the weird mac formatting is not in the official spec for CSV, but it 
would be nice if we could use it anyways. Why create the extra work for my 
end-users? 

This test.csv will fail to load. This test2.csv is identical, except for it 
uses windows end of line formats. A good text editor doesn't show the 
difference. The actual difference is that Mac format doesn't use line feed 
characters, and this causes jquery-csv to fail. This also messes up 
Notepad.exe. 

Original issue reported on code.google.com by Chad.R.B...@gmail.com on 30 May 2013 at 1:06

Attachments:

GoogleCodeExporter commented 8 years ago
Technically 'carriage return' line endings are no longer a valid form of line 
ending on any platform. Since the release of OSX mac line endings have followed 
the Unix convention (ie newline).

Excel for OSX is the only application I know of that still (incorrectly) 
explicitly uses CR line endings.

To keep the parser as simple and efficient as possible, it silently ignores all 
CR characters making both CRLF and LF characters read the same.

There are two approaches to fix this issue:

1. Run the CSV data through a pre-processor function that converts all CR/CRLF 
characters to LF characters.

2. Change the parser code to add special cases for both CRLF and CR line 
endings.

I strongly suggest the former. Hacking on the parser code is no simple matter. 
The lexer is written to be as slim and efficient as possible, adding more edge 
cases will slow down the parser for everything. You may see a lot of CSV with 
CR characters but in the greater scope of things, CSV with CR line endings is 
not a common occurrence.

If you'd like to contribute a patch, or need help working out a pre-processor 
I'll try to help. Due to personal/work circumstances I don't have the time to 
focus on further development right now.

Original comment by evanpla...@gmail.com on 31 May 2013 at 4:40

GoogleCodeExporter commented 8 years ago
I know you don't want to add this, but man I would totally donate beer money to 
get it working. Having to tell people to open their files in Excel just to 
save-as a different csv isn't a great user experience.

Is there any javascript out there that normalizes line endings in a csv?

Original comment by jpsi...@gmail.com on 4 Sep 2013 at 2:28

GoogleCodeExporter commented 8 years ago
Can we just use regular expressions or something to convert all CR to LF? 

Original comment by Chad.R.B...@gmail.com on 9 Sep 2013 at 10:33

GoogleCodeExporter commented 8 years ago
it would be awesome if this was just built in as a failsafe, i run in to this 
issue constantly

Original comment by jpsi...@gmail.com on 10 Sep 2013 at 4:26

GoogleCodeExporter commented 8 years ago
I was able to get the parser to work by treating the "^/r$" case as end of line 
in all 3 states of the parser. I don't know how this will affect non Unix/Excel 
exported case. 

Original comment by dan.bo...@gmail.com on 10 Sep 2013 at 8:18

GoogleCodeExporter commented 8 years ago
I would agree with the comment that "it would be awesome if it was just built 
in"  Since part of the idea of jQuery is corss-platform functionality it seems 
that it should be the library that handles it and not have to special case the 
use of the library.

Original comment by dan.bo...@gmail.com on 10 Sep 2013 at 8:21

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Here is a copy of the parser as I mentioned in comment #5

Original comment by dan.bo...@gmail.com on 10 Sep 2013 at 8:38

Attachments:

GoogleCodeExporter commented 8 years ago
Dan, what exactly did you change in this? whatever it is fixes my issue

Original comment by jpsi...@gmail.com on 10 Sep 2013 at 9:39

GoogleCodeExporter commented 8 years ago
wait nevermind, i spoke too soon.  this just switches it up, it works for the 
mac one but breaks the normal working ones

Original comment by jpsi...@gmail.com on 10 Sep 2013 at 9:41

GoogleCodeExporter commented 8 years ago
I only have the Mac Excel files, If you can send me a working case test file. 
Let me look at it.

Original comment by dan.bo...@gmail.com on 10 Sep 2013 at 9:47

GoogleCodeExporter commented 8 years ago
Attached is one that is totally messed up. It works fine if I open it in excel, 
save-as, and change the format to "windows comma separated (.csv)".   If I do 
that there are no issues with the parser.

Original comment by jpsi...@gmail.com on 10 Sep 2013 at 11:28

Attachments:

GoogleCodeExporter commented 8 years ago
That "openflash.csv" is one that doesn't work with your fix. it only works if i 
resave it in the other format

Here is one that works with your fix, and not with the normal jquery-csv:
http://datazap.me/sites/default/files/datalogs/admin/bad-datalog.csv

Original comment by jpsi...@gmail.com on 10 Sep 2013 at 11:31

GoogleCodeExporter commented 8 years ago
Ok I took a guess at what was happening and I think I figured it out. 
I changed the regular expression to include |/r/n| as one of the options and 
put if first so it matches so now all 3 cases are considered a newline with no 
"phantoms" it appears to work on limited testing with your file and the mac 
cases.  I will play with it some more, but here is my latest version

Original comment by dan.bo...@gmail.com on 10 Sep 2013 at 11:46

Attachments:

GoogleCodeExporter commented 8 years ago
awesome!! the "bad-datalog.csv" file works with your latest fix. the crazy 
"openflash" case is exceptionally messed up. That said, if excel can open and 
resave it and have it work, there's gotta be something that can make it work 
with this library

if you have paypal please let me send you a few $

Original comment by jpsi...@gmail.com on 11 Sep 2013 at 12:08

GoogleCodeExporter commented 8 years ago
You're welcome.   I need this to work as much as you do, so don't worry about  
any  payment.  Just glad I could help.

Original comment by dan.bo...@gmail.com on 11 Sep 2013 at 12:38

GoogleCodeExporter commented 8 years ago
Any ideas how to make that other file work? hopefully a quick fix like your 
other fix?

Original comment by jpsi...@gmail.com on 11 Sep 2013 at 1:12

GoogleCodeExporter commented 8 years ago
Sorry I totally missed that file and then life got in the way.
I'll take a quick look in the morning

Original comment by dan.bo...@gmail.com on 11 Sep 2013 at 3:56

GoogleCodeExporter commented 8 years ago
ok, I need some information about the openflash.csv file.   Appears to be 
unicode or other character set, so I need the character set information, so I 
can set my filereader up correctly.
It appears to be ";" separated, but I want to confirm that.  
Last do you know the delimiter is because the data appears to have quotes in 
the middle of column values which is what it is causing the problem,  Not sure 
what the delimiter should be.

Also   There is a bunch of font related and I believe column formmating stuff 
at the beginning of that file which I am not sure if it is causing a problem or 
not, but it could be. 

Note:  My version of Excel won't even open this file.

Original comment by dan.bo...@gmail.com on 11 Sep 2013 at 3:41

GoogleCodeExporter commented 8 years ago
i wrote to the guy who makes the device that generates these logs. i'll keep 
you posted

Original comment by jpsi...@gmail.com on 11 Sep 2013 at 11:51

GoogleCodeExporter commented 8 years ago
For folks looking for a code snippet that fixes the problem: Here's what I use 
to make all the new line characters consistent. One line scrubs the whole input 
before sending it to CSV parser.... 

// Normalize new lines
result = result.replace(/[\r|\r\n]/g, "\n"); 

// Parse the CSV to a 2D array
Selfservice.csvData = $.csv.toArrays(result);

Original comment by DJu...@gmail.com on 1 Nov 2013 at 10:25

GoogleCodeExporter commented 8 years ago
Issue 31 has been merged into this issue.

Original comment by evanpla...@gmail.com on 9 Dec 2013 at 11:21

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
openflash.csv is Microsoft Compound File Binary Format (type=CFBF ext=.cfb)
The header 20 D0 CF 11  E0 A1 B1 1A really gives it away.
it is not CSV, or even a text file!

Original comment by crazy_l...@netspace.net.au on 12 Apr 2014 at 1:06

GoogleCodeExporter commented 8 years ago
Suggestion from post 21 - DJu...
that was amazingly helpful. Thank you!

Original comment by Gitelman...@gmail.com on 29 Apr 2014 at 6:07

GoogleCodeExporter commented 8 years ago
The #21 solution had a problem with large files for me.  

The update in #14 appears to have fixed the issue for me.
Is this going to roll into a release?

Original comment by jerryga...@yahoo.com on 22 May 2014 at 9:02

GoogleCodeExporter commented 8 years ago
Fix from #14 worked for me, too. Makes me worried when the first csv file I try 
to parse using this library didn't work. How maintained is this library?

Original comment by barl...@gmail.com on 6 Jun 2014 at 4:47

GoogleCodeExporter commented 8 years ago
If you clone the source repository the fix should already be included. 

@mirlord has been maintaining a fork on GitHub in my absence. I recently moved 
the upstream repo over to GitHub too. As soon as I finish some work on the test 
runner I plan to push out another release.

Most/all of the remaining issues have been addressed. The only major feature 
missing is the ability to process very large data sets.

Original comment by evanpla...@gmail.com on 6 Jun 2014 at 4:56

GoogleCodeExporter commented 8 years ago
Fix #14 and #21 together worked a charm! Thanks guys

Original comment by djmatt...@gmail.com on 8 Jul 2014 at 10:54

GoogleCodeExporter commented 8 years ago
@evanpla...@gmail.com: Could you please share the URL of the upstream repo on 
GitHub?

@mirlord's GH doesn't work for me, says 'csv' is undefined. Thanks.

Original comment by jonatan....@gmail.com on 10 Jul 2014 at 5:34