lizmat / App-Rak

21st century grep / find / ack / ag / rg on steroids
Artistic License 2.0
152 stars 7 forks source link

multiple CSV files parsing error #34

Closed Zer0-Tolerance closed 1 month ago

Zer0-Tolerance commented 1 year ago

I've just discovered this bug when you try to parse multiple CSV files you get the non unique field error:

> cat /tmp/a.csv
a,b,c,d
1,2,3,4
5,6,7,8
> cat /tmp/b.csv
a,b,c,d
5,6,7,8
1,2,3,4
rak --csv-per-line '{.<a>}' /tmp/*.csv
INI - the header contains nun-unique fields: d(2), a(2), c(2), b(2) : error 1013 @ record 1, field 9, position 0
INI - the header contains nun-unique fields: d(2), a(2), c(2), b(2) : error 1013 @ record 1, field 9, position 0
A worker in a parallel iteration (hyper or race) initiated here:
  in sub show-results at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 955
  in sub rak-results at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 814
  in sub action-csv-per-line at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 2567
  in sub main at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 428
  in block <unit> at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/resources/D62D8FB6A12F7BE3663F861816A73ECF3CF19D30 line 3
  in sub MAIN at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/bin/rak line 3
  in block <unit> at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/bin/rak line 1

Died at:
    INI - the header contains nun-unique fields: d(2), a(2), c(2), b(2)
      in sub show-results at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 955
      in sub rak-results at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 814
      in sub action-csv-per-line at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 2567
      in sub main at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/sources/4A5C6464A66ED80C4A7267EB86D113AB750C4323 (App::Rak) line 428
      in block <unit> at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/resources/D62D8FB6A12F7BE3663F861816A73ECF3CF19D30 line 3
      in sub MAIN at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/bin/rak line 3
      in block <unit> at /Users/.rakubrew/versions/moar-2022.07/share/perl6/site/bin/rak line 1
Zer0-Tolerance commented 1 year ago

Any update / idea ? Parsing similar CSV files seems like a very common use case.

lizmat commented 1 year ago

Sorry, I don't think I'll have time for this before sometime mid next week. Just too many other things on my plate atm.

Zer0-Tolerance commented 10 months ago

re-up

lizmat commented 4 months ago

I cannot reproduce this anymore. Possibly it got fixed. Could you confirm?

Zer0-Tolerance commented 4 months ago

nope still the same, weird that you can't reproduce. Just create 2 CSV: /tmp/a.csv:

a,b,c,d
1,2,3,4
5,6,7,8

/tmp/b.csv

a,b,c,d
5,6,7,8
1,2,3,4

then rak --csv-per-line '{.<a>}' /tmp/*.csv should give :

INI - the header contains nun-unique fields: c(2), a(2), b(2), d(2) : error 1013 @ record 1, field 9, position 0
INI - the header contains nun-unique fields: c(2), d(2), b(2), a(2) : error 1013 @ record 1, field 9, position 0
A worker in a parallel iteration (hyper or race) initiated here:
  in sub show-results at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/4133A82CFF5D408502A9EC45E9908E2974D860C8 (App::Rak) line 994
  in sub rak-results at /Users//.rakubrew/versions/moar-2024.04/share/perl6/site/sources/4133A82CFF5D408502A9EC45E9908E2974D860C8 (App::Rak) line 853
  in sub action-csv-per-line at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/4133A82CFF5D408502A9EC45E9908E2974D860C8 (App::Rak) line 2655
  in sub main at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/4133A82CFF5D408502A9EC45E9908E2974D860C8 (App::Rak) line 453
  in block <unit> at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/resources/BBA0D53D17B2D51BD8DF43D604568E1DFE767438 line 3
  in sub MAIN at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/bin/rak line 3
  in block <unit> at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/bin/rak line 1

Died at:
    INI - the header contains nun-unique fields: c(2), a(2), b(2), d(2)
      in method fail at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/529C70AFB4E70F79FF4EFC56EBB395721389B579 (Text::CSV) line 529
      in method header at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/529C70AFB4E70F79FF4EFC56EBB395721389B579 (Text::CSV) line 648
      in method CSV at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/529C70AFB4E70F79FF4EFC56EBB395721389B579 (Text::CSV) line 1802
      in method csv at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/529C70AFB4E70F79FF4EFC56EBB395721389B579 (Text::CSV) line 2003
      in block  at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/4133A82CFF5D408502A9EC45E9908E2974D860C8 (App::Rak) line 2651
      in block  at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/06929F4087F6BB21C80DE6D1339ADD44DD0EC4CA (rak) line 1011
      in block  at /Users/.rakubrew/versions/moar-2024.04/share/perl6/site/sources/06929F4087F6BB21C80DE6D1339ADD44DD0EC4CA (rak) line 1294
lizmat commented 4 months ago
% cat a.csv 
a,b,c,d
1,2,3,4
5,6,7,8

% cat b.csv 
a,b,c,d
5,6,7,8
1,2,3,4

% rak --csv-per-line '{.<a>}' *.csv
a.csv
1:1
2:5

b.csv
1:5
2:1

So I really can't reproduce it!

What does zef info Text::CSV say?

Zer0-Tolerance commented 4 months ago
- Info for: Text::CSV
- Identity: Text::CSV:ver<0.022>:auth<zef:Tux>
- Recommended By: Zef::Repository::Ecosystems<fez>
- Installed: Yes
Description:     Handle CSV data. API based on Perl's Text::CSV_XS
License:     Artistic-2.0
Source-url:  git://github.com/Tux/CSV.git
Provides: 2 modules
Support:
#   irc:    irc://irc.perl.org/#csv
Depends: 4 items
Zer0-Tolerance commented 4 months ago

just re-did the test with0.0.26 and now it works on the first run then if you repeat the same command you get the error ... Super weird !

Zer0-Tolerance commented 4 months ago

ok I get it now, the issue comes from a race condition it seems , so when you run it with --degree=1 it's always working while when doing multiprocessing there is a high chance of getting this error when each thread are trying to declare the header in // hence the non unique header error.

Zer0-Tolerance commented 4 months ago

any thoughts on this one ?

lizmat commented 2 months ago

Could you check whether the 0.3.2 release fixes this? It gives each CSV file its own fresh Text::CSV object, which will hopefully fix the race condition.

lizmat commented 1 month ago

Closing this now. Please re-open if this didn't fix it.