jqnatividad / qsv

CSVs sliced, diced & analyzed.
https://qsv.dathere.com
The Unlicense
2.43k stars 70 forks source link

qsv excel only works with Little Endian? #609

Closed ondohotola closed 1 year ago

ondohotola commented 1 year ago

I have a number of spreadsheets written by the Perl module Spreadsheet::WriteExcel.

When I extract with qsv excel I get something like ^@ for every character and further processing such as search becomes impossible.

When I look at the spreadsheets with file I get something like

a.xls: CDFV2 Microsoft Excel

whereas the "normal" files (which extract properly) look like

b.xls: Composite Document File V2 Document, Little Endian...

Desktop (please complete the following information):

ondohotola commented 1 year ago

In the interim I can pipe through

tr -d '\000'

jqnatividad commented 1 year ago

@ondohotola thanks for the report.

Can you attach some sample Excel files generated by Spreadsheet::WriteExcel to help me reproduce the error?

ondohotola commented 1 year ago

Joel,

I thank you for reverting.

I have simplified my Perl script (a little), which generates testforqsv.xls which produces the error. Once I convert to ODS it does not, and when I convert that back to XLS it also does not.

for i in *xls *ods
do
    echo ""
    echo "$i:"
    qsv excel $i|qsv search -s 2 '2:1'
done

shows

testforqsv.xls:
3 9-column rows exported from "Sheet1"
Account Number,Account Name,Transaction Date,Transaction

ID,Amount,Reference,Description,Notes,Memo

testforqsvnew.xls:
3 9-column rows exported from "Sheet1"
Account Number,Account Name,Transaction Date,Transaction

ID,Amount,Reference,Description,Notes,Memo Test-2:0,Test-2:1,Test-2:2,Test-2:3,Test-2:4,Test-2:5,Test-2:6,Test-2:7,Test-2:8

testforqsv.ods:
3 9-column rows exported from "Sheet1"
Account Number,Account Name,Transaction Date,Transaction

ID,Amount,Reference,Description,Notes,Memo Test-2:0,Test-2:1,Test-2:2,Test-2:3,Test-2:4,Test-2:5,Test-2:6,Test-2:7,Test-2:8

qsv excel testforqsv.xls | tr -d '\000' | qsv search -s 2 '2:1'

shows

3 9-column rows exported from "Sheet1"
Account Number,Account Name,Transaction Date,Transaction

ID,Amount,Reference,Description,Notes,Memo Test-2:0,Test-2:1,Test-2:2,Test-2:3,Test-2:4,Test-2:5,Test-2:6,Test-2:7,Test-2:8

And finally

file *ods *xls

shows

testforqsv.ods:    OpenDocument Spreadsheet
testforqsv.xls:    CDFV2 Microsoft Excel
testforqsvnew.xls: Composite Document File V2 Document, Little Endian,
Os: Windows, Version 1.0, Code page: -535, Revision Number: 0

greetings, el

On 24/11/2022 14:43, Joel Natividad wrote:

@ondohotola https://github.com/ondohotola thanks for the report.

Can you attach some sample Excel files generated by |Spreadsheet::WriteExcel| to help me reproduce the error?[...]

-- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist @.** / | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply

jqnatividad commented 1 year ago

Hi @ondohotola , Thanks for the detailed follow-up... can you also provide some sample files for the scenario above?

I'm particularly interested in the file generated by Spreadsheet::WriteExcel.

ondohotola commented 1 year ago

I did attach them but don't know what github did to them.

EMail me off line then I'll send them again by reply.

Or let me know if I need to figure out how to upload into github via the web interface. I am not averse to RTFM :-)-O

el

On 28/11/2022 15:56, Joel Natividad wrote:

Hi @ondohotola https://github.com/ondohotola , Thanks for the detailed follow-up... can you also provide some sample files for the scenario above?

I'm particularly interested in the file generated by |Spreadsheet::WriteExcel|. [...]

jqnatividad commented 1 year ago

@ondohotola , You should just be able to drag them on top of the text field right here on GitHub issues - if you look at the bottom of the comment box it says - "Attach files by dragging & dropping, selecting or pasting them."

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/attaching-files

And yes, its OK to sometimes Read the Fine Manual... 😉

ondohotola commented 1 year ago

testforqsv.ods testforqsv.xls testforqsvnew.xls

#!/usr/bin/env perl
use Spreadsheet::WriteExcel;    # To write the XLS
$workbook  = Spreadsheet::WriteExcel->new("testforqsv.xls");
$worksheet = $workbook->add_worksheet();
#
# set it up
#
$bc = $workbook->add_format();
$bc->set_bold();
$bc->set_color('black');
$bc->set_align('center');

$left = $workbook->add_format();
$left->set_align('left');

$right = $workbook->add_format();
$right->set_align('left');

$dollar = $workbook->add_format();
$dollar->set_num_format('0.00');
$dollar->set_align('right');

$format[0] = $right;
$format[1] = $left;
$format[2] = $left;
$format[3] = $right;
$format[4] = $dollar;
$format[5] = $left;
$format[6] = $left;
$format[7] = $left;
$format[8] = $left;

$worksheet->set_column( 0, 0, 14 );
$worksheet->set_column( 1, 1, 25 );
$worksheet->set_column( 2, 3, 14 );
$worksheet->set_column( 4, 4, 9 );
$worksheet->set_column( 5, 5, 20 );
$worksheet->set_column( 6, 8, 30 );

$worksheet->write ( 0, 0, "Account Number",   $bc );
$worksheet->write ( 0, 1, "Account Name",     $bc );
$worksheet->write ( 0, 2, "Transaction Date", $bc );
$worksheet->write ( 0, 3, "Transaction ID",   $bc );
$worksheet->write ( 0, 4, "Amount",           $bc );
$worksheet->write ( 0, 5, "Reference",        $bc );
$worksheet->write ( 0, 6, "Description",      $bc );
$worksheet->write ( 0, 7, "Notes",            $bc );
$worksheet->write ( 0, 8, "Memo",             $bc );

for ( $i = 1; $i <= 3; $i++ ) {
    for ( $j = 0; $j <= 8; $j++ ) {
        $worksheet->write( $i, $j, "Test-$i:$j", $format[$j] );
    }
}
$workbook->close () or die "Error closing file: $!";

Could be shorter, maybe, but that does produce the uploaded file, which causes the issue.

Thanks, el

jqnatividad commented 1 year ago

Hi @ondohotola , Have you tried using Excel::Writer::XLSX instead which was written by the same author of Spreadsheet::WriteExcel?

The author himself recommends Excel::Writer::XLSX as a drop-in replacement for WriteExcel which is in maintenance only mode.

Also, qsv uses the calamine crate to read Excel files, and it has more robust support of the XLSX file format in general.

ondohotola commented 1 year ago

Joel,

That script is untouched for many years, as it reliably pulls the General Ledger from my book keeping (SQL-Ledger)'s PostreSQL backend into the Excel sheet my auditor wants :-)-O

I shall peruse and report back :-)-O

Thank you,

el

-- Sent from my iPhone On 3. Dec 2022 at 04:14 +0200, Joel Natividad @.***>, wrote:

Hi @ondohotola , Have you tried using Excel::Writer::XLSX instead which was written by the same author of Spreadsheet::WriteExcel? The author himself recommends Excel::Writer::XLSX as a drop-in replacement to WriteExcel which is in maintenance only mode. Also, qsv uses the calamine crate to read Excel files, and it has more robust support of the XLSX file format in general. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ondohotola commented 1 year ago

Joel,

I looked the script was untouched since 2008 :-)-O

But I have made the replacements in the script and with XLSX it does not show the behavior (neither with ODT, by the way).

I have modified my pipeline accordingly to look for the file format (just for the fun of it, and added ODT to the test, even my auditor doesn't use LibreOffice :-)-O).

Since there is an easy workaround with the pipe (|tr -d '\000') perhaps one should just note it, not spend much effort on fixing it and close the issue accordingly.

Thank you so very much for looking at this and for the tool itself, QSV (in combination with CSVQ and/or CSVIEW) has improved my workflow significantly.

greetings, el

jqnatividad commented 1 year ago

Hi @ondohotola , I'm glad you find qsv useful!

As for your workaround, perhaps you can add an entry into the Cookbook? 😉

As for CSVIEW, have you tried csvlens? It's written in Rust and uses the same csv library qsv uses...

Best, Joel

ondohotola commented 1 year ago

Thank you.

I did not know, but it looks helpful.

I’ll see if I can write something up.

el

-- Sent from my iPhone On 6. Dec 2022 at 06:45 +0200, Joel Natividad @.***>, wrote:

Hi @ondohotola , I'm glad you find qsv useful! As for your workaround, perhaps you can add an entry into the Cookbook? 😉 As for CSVIEW, have you tried csvlens?m It's written in Rust and uses the same csv library qsv uses... Best, Joel — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ondohotola commented 1 year ago

Done

ondohotola commented 1 year ago

There was an issue #20 preventing piping to csvlens which has now been fixed.

[...]

As for CSVIEW, have you tried csvlens? It's written in Rust and uses the same csv library qsv uses... [...]