impress-org / givewp

GiveWP - The #1 Donation Plugin for WordPress. Easily accept donations and fundraise using your WordPress website.
https://givewp.com/
GNU General Public License v3.0
340 stars 191 forks source link

Chinese/Japanese/Korean characters broken when generating/exporting CSV #5761

Closed jgreys closed 1 year ago

jgreys commented 3 years ago

Details

When I export the data in CSV format ( Donations > Tools > Export ), if the data contains Chinese, Japanese, or Korean characters, they are broken.

Expected Behavior

When exporting data, the data must be readable.

Steps to Reproduce

  1. Go to Donations > Tools > Export
  2. Click on Generate CSV
  3. Open the csv file generated
  4. Characters broken
github-actions[bot] commented 1 year ago

This issue is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 14 additional days.

kjohnson commented 1 year ago

Cross-posting from #5762:

TL;DR

We should add a filter between the headers and CSV output to add support for a Byte Order Mark (BOM) in exports.

public function export()
{
    $this->headers();

+    if( apply_filters( 'givewp_export_include_byte_order_mark', false ) ) {
+    echo "\xEF\xBB\xBF"; // Byte Order Mark 
+    }

    $this->csv_cols_out();
    $this->csv_rows_out();

    give_die();
}

Additional Details

[W]hen working on Ninja Forms I remember introducing a filter to configure use of the BOM in CSV export. It was initially hard coded, but a filter was added to remove it as it can cause issues in some places (while it resolves issues in other places).

Additionally, I'll note that the below does not specify the utf-8 character set in the Content-Type header.

See 1b8c76.patch

@@ -99,14 +99,20 @@ function ninja_forms_export_subs_to_csv( $sub_ids = '', $return = false ){
   header("Content-type: application/csv");
-  echo "\xEF\xBB\xBF"; // Byte Order Mark
+  echo apply_filters('ninja_forms_csv_bom',"\xEF\xBB\xBF") ; // Byte Order Mark
}

Citing the Unicode standard, I think we should maintain a default of not including a BOM, but perhaps allow one to be added for specific use cases. See Unicode v13, chapter 2, section 6 - emphasis added:

"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. "

@jgreys I would request, as a maintainer, that instead of hard-coding the BOM mark we add a hook for prepending an optional BOM to the CSV export.

JoshuaHungDinh commented 1 year ago

The related PR has been closed. Please reference: Cross-posting from https://github.com/impress-org/givewp/pull/5762: