Open davidroeca opened 6 years ago
I think the problem is related to how input files are handled. Probably there is some empty line added there.
I've parsed the same file on node which produces the correct behaviour:
You can try it with the following command:
Papa = require('papaparse.js')
Papa.parse(fs.createReadStream('/tmp/mozilla_sergi0/broken.txt'), {delimiter: ',', header: true, complete: function(results){ console.log(results);}})
On a node console executed from the same directory where you have the papaparse.js file
The demo code is available on the gh-pages of this repository. Maybe that's what it has to be fixed.
@pokoli the same issue exists in my app, with version 4.3.6
, so I'm pretty sure this isn't just an issue with the demo on the site.
EDIT: This works in node with 4.3.6
as well, just as the text on the demo site works flawlessly. Trying to find a good way to emulate a file-like object from the browser in nodejs, but I believe the problem boils down to the browser's file object implementation
I've tested this on firefox and chrome and it seems that the FileReader
API's readAsText method always adds a line break to the end of a file that doesn't originally have them.
There are very few hints in the standard that mention these assumptions other than the notion of converting line endings to native, which may explain a similar algorithm to how each line is handled in the implementation (strangely, default blobs are transparent
and not native
).
I'm not sure if there's a graceful way to handle this case, but if there's some way to trim the last trailing newline, that could be one approach.
Yes, I had the feeling that will be some weird behaviours about browsers. Thanks for confirming.
I don't like the idea of handling this behaviour on the library as it will probably end with an undtested code that may become obsolte when the browsers change the behaviour.
I think the easiet solution is to skipEmptyLines. Probably we should add a note on the docs (they are in the gh-pages branch of this repository) explaining the behaviour of the browsers and recomending to use the skipEmptyLines flag if reading files from FileReader API.
What dou you think?
I agree, especially since this behavior isn't even part of a standard so it may change. It probably makes sense to document, possibly something to highlight in the demo portion as well.
I added an additional config in a fork which only skips empty lines at the end of the file, though skipEmptyLines might be enough here.
https://github.com/mholt/PapaParse/pull/446 should be merged first in either case
I'm wating to the new parameter to merge #446.
I don't think we should have a flag to skip the last row on base.
I'm working on the new parameter in #446 right now. Should be open for PR soon.
The skipEmptyLines option does not work on a Macintosh .csv. The carriage return will come through as:
11: ["↵"]
@shamess
Excel saves an extra line at the end of the file (this new line is in the source file itself, I haven't seen that new line behaviour replicated when using File.readText, though the API could have changed that behaviour already)
Tested by manually removing in notepad then opening in Excel and re-saving as .csv
Papa Parse appears to start trying to parse the newline as the first field of a new row
resolves to a null row and provides an error referring to that "row" (blank new line)
code: "TooFewFields"
message: "Too few fields: expected n fields but parsed 1"
row: m
type: "FieldMismatch"
Checked by using Notepad++ with Show Symbol > Show All Characters turned on
CR
LF
For Mac, does your file use the same line character standard, consistently? (so the config option would help here)
I just stumbled over this. I create RFC 4180 CSV Files with help of csv-writer. I my end-to-end tests, I am using papaparse
to parse and evaluate the result. According to RFC 4180
, each line (record) must be terminated by a linebreak (including the last one). My interpretation is, that every csv file must end with an empty line (which is pretty much best practice in most text based formats, source files, etc. as well.)
Thus, papaparse
should not add an empty record at the end which the developer needs to suppress by manually enabling skipEmpty
lines.
skipEmptyLines: true
Hello could you specify where exactly I can add this code?
Thanks for such a great library! Just pointing out a minor issue I've found both in the application I'm working with and in the demo--when parsing a local file, an additional (blank) row is added.
Try the following file: broken.txt (file copy of the string example) at http://papaparse.com/demo to see what I mean.
header: true
gives the explicit error where it says the final line is missing all columns after column 1, which is recorded as the empty string.Not a huge issue as I can add
skipEmptyLines: true
as a work-around, but the file itself doesn't have this final line.