mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.38k stars 1.14k forks source link

First and Last CSV column issues #855

Open douglasrcjames opened 3 years ago

douglasrcjames commented 3 years ago

When parsing a CSV from a URL through data stream, I am getting an 'undefined' chunk['value'] for fields that are the first or last column in the set.

For example, I had the 'Order ID' field as the first column, and it took me forever to find out that if I just put another column that I am not gathering for this dataset at the beginning of the file, the chunk['Order ID'] will return the proper value.

I also had the problem where the last column had a newline \r in it, which was only fixed by me calling the last column with that new line char in it like so: status: chunk['Status\r']. I found that it is easier for now to just tell the client to just place an empty, non gathered column at the end of the file for it to work.

Ideally, I'd like for my client to upload any file with the matching header label names and the system will read in the data, without any caveats. Does anyone have a better fix for this? Am I doing something wrong with Papa parse or are my CSV files not saved properly? Anything helps, thanks!

Code:

const orders: any = [];

// Parse through uploaded CSV
const options = {
    download: true,
    header: true,
    worker: true
};

const parseStream = await Papa.parse(Papa.NODE_STREAM_INPUT, options);
allPromises.push(parseStream);

const dataStream = await request.get(newValue.fileUrl).pipe(parseStream);
allPromises.push(dataStream);

allPromises.push(
    await parseStream.on("data", async (chunk: any) => {
        // Check these values are defined
        if(chunk['Order ID'] && chunk['Order ID'] !== undefined && chunk['Order ID'] !== null){ 
            orders.push({
                id: chunk['Order ID'],
                tracking: chunk['Tracking #'],
                carrier: chunk['Carrier'],
                method: chunk['Method'],
                date: chunk['Shipment Date'],
                status: chunk['Status'],
            })
        } else {
            console.error("Order ID column was not probably defined properly.")
        }
    })
);
jjspace commented 2 years ago

This is a rather old issue but I just ran into this same problem. What was worse was that in my situation logging out the parsed objects, the Object.entries and Object.keys all showed the first column property as I expected it (for example 'id'). However trying to access row.id or row["id"] returned undefined. I finally figured out using Object.keys(row)[0][0].length that it was 3 not 2 which made me I assume there was a hidden unicode character. Deleting the file and re-creating it solved my issue so maybe that'll help someone in the future.

douglasrcjames commented 2 years ago

I will give this a shot when I get around to it, thanks for the suggestion, I am still having the issue lol!

elsheraey commented 1 year ago

I've been facing a similar issue and I had it fixed with transformHeader: header => header.trim() added to my options following a hint from this link though I don't quite understand what is it trimming as I get the correct key when I use Object.keys which is very confusing.