mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.47k stars 1.14k forks source link

transformHeader called twice #1029

Open kiocosta opened 11 months ago

kiocosta commented 11 months ago

I have this sample code

let counter = 0;
let csv = "First name,Last name\nJohn,Doe"
let json = Papa.parse(csv, {
    header: true,
    transformHeader: (header, a) => {
        counter++;
        console.log(a);
        return header;
    },
 });
console.log({ counter })

This is the result from running it: image It's interesting that the second argument starts as a string and then becomes a number. I think this suggests transformHeader is being executed from different lines each time.

Logging "header" returns this: "First name" "Last name" "First name" "Last name"

It seems like transformHeader is executed twice for the first row. This is causing a bug in my application and I don't know why this behavior is happening.

I'd really appreciate some help with it. Thanks in advance!!

kiocosta commented 11 months ago

I guess the example I've posted is working as expected. In my application a bug was happening, but that was because I had duplicate headers (some headers ended up becoming just "" after my transformation). This was causing unexpected behaviors. After fixing this, the problem ceased to occur.

samy-mssi commented 8 months ago

I guess the example I've posted is working as expected. In my application a bug was happening, but that was because I had duplicate headers (some headers ended up becoming just "" after my transformation). This was causing unexpected behaviors. After fixing this, the problem ceased to occur.

Hi, I am sorry to open this issue again, but I am facing the same issue after upgrading my whole React project, and you are the only one that got the same behavior.

I have an array "headers", and I push each header on it.

      transformHeader: (header, index: number) => {
        console.log('headers', headers, index);
        const headerValue = () => {
          return header
            ? header
            : t('grid.header.auto-generated-label', {
                columIndex: index + 1,
              });
        };
        const name = fileWithHeader ? headerValue() : `${index + 1}`;
        headers.push(name);
        headers = utils.getDuplicatesHeader(headers);
        return name;
      },

and here is what I got in my console : image

I checked my csv and of course I have only 8 columns...

kiocosta commented 8 months ago

I guess the example I've posted is working as expected. In my application a bug was happening, but that was because I had duplicate headers (some headers ended up becoming just "" after my transformation). This was causing unexpected behaviors. After fixing this, the problem ceased to occur.

Hi, I am sorry to open this issue again, but I am facing the same issue after upgrading my whole React project, and you are the only one that got the same behavior.

I have an array "headers", and I push each header on it.

      transformHeader: (header, index: number) => {
        console.log('headers', headers, index);
        const headerValue = () => {
          return header
            ? header
            : t('grid.header.auto-generated-label', {
                columIndex: index + 1,
              });
        };
        const name = fileWithHeader ? headerValue() : `${index + 1}`;
        headers.push(name);
        headers = utils.getDuplicatesHeader(headers);
        return name;
      },

and here is what I got in my console : image

I checked my csv and of course I have only 8 columns...

Hey @samy-mssi! At any point in time, are you returning the same value for the name variable on the transformHeader function? In my case, I also didn't have duplicate columns on my CSV, but I had columns which were mapping to the same value in my transformHeaders and this ended up being the problem. Example: "custom_field_1,custom_field_2,custom_field_3" were all mapping to "". After I mapped each column name to a specific and unique value, the problem no longer happened.

samy-mssi commented 8 months ago

I guess the example I've posted is working as expected. In my application a bug was happening, but that was because I had duplicate headers (some headers ended up becoming just "" after my transformation). This was causing unexpected behaviors. After fixing this, the problem ceased to occur.

Hi, I am sorry to open this issue again, but I am facing the same issue after upgrading my whole React project, and you are the only one that got the same behavior. I have an array "headers", and I push each header on it.

      transformHeader: (header, index: number) => {
        console.log('headers', headers, index);
        const headerValue = () => {
          return header
            ? header
            : t('grid.header.auto-generated-label', {
                columIndex: index + 1,
              });
        };
        const name = fileWithHeader ? headerValue() : `${index + 1}`;
        headers.push(name);
        headers = utils.getDuplicatesHeader(headers);
        return name;
      },

and here is what I got in my console : image I checked my csv and of course I have only 8 columns...

Hey @samy-mssi! At any point in time, are you returning the same value for the name variable on the transformHeader function? In my case, I also didn't have duplicate columns on my CSV, but I had columns which were mapping to the same value in my transformHeaders and this ended up being the problem. Example: "custom_field_1,custom_field_2,custom_field_3" were all mapping to "". After I mapped each column name to a specific and unique value, the problem no longer happened.

Hey, thanks for the quick reply, I managed it by adding a simple condition

          if (!headers.includes(name)) {
            headers.push(name);
          }

it works in my case, but it seems weird that I have this issue after upgrading my project.. I still think there is something to fix in papaparse.

kraighamady commented 8 months ago

I have the same situation with just a simple header.trim() being run on each value. I also tried pushing into an array and was duplicate values and debugged by adding console.log(header) and got a complete listing of the headers and then an other complete listing. I tested everything to make sure I wasn't calling parse more than once and it seems to be run twice in PapaParse itself.

For anyone else seeing this issue, I was able to work around by using Set on the output array I was creating:

let headerList = [... new Set(headerOutputList)];

EDIT: To clarify, the above code was run after the parse on the array I was pushing into during the parse. Here is a more robust example:

let headerOutputList = [];

const headerTransform = ( header ) => {
    headerOutputList.push(header);
    return header.trim().toLowerCase();
}

and then after the parse:

let headerList = [... new Set(headerOutputList)];
bettysteger commented 8 months ago

Does it work with PapaParse version 5.3.2? because then https://github.com/mholt/PapaParse/issues/1006 is connected

kiocosta commented 8 months ago

Does it work with PapaParse version 5.3.2? because then #1006 is connected

My problem was happening in 5.4.1