rebekz / parsecsv-for-php

Automatically exported from code.google.com/p/parsecsv-for-php
MIT License
0 stars 0 forks source link

Deal with UTF-8 Unicode (with BOM) text #23

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Parse the attached file, which is UTF-8 with BOM:

$ file utf8-bom.csv 
utf8-bom.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators

<?php
$csv = new parseCSV();
$csv->parse(utf8-bom.csv);

2. The first heading, 'Employee_ID', will have the BOM on it, making it very 
difficult to access:

<?php
print_r($csv->data[0]); // you will see [Employee_ID] => E001689
echo $csv->data[0]['Employee_ID'];  // this will return nothing 

3. If the BOM is first stripped off of the file, then there is no issue

http://www.dotvoid.com/2010/04/detecting-utf-bom-byte-order-mark/

Original issue reported on code.google.com by ryancour...@gmail.com on 17 Jan 2012 at 4:43

Attachments:

GoogleCodeExporter commented 8 years ago
I've been able to work around the issue by adding the following chunk of code 
into load_data():

// strip off BOM
if (strpos($data, "\xef\xbb\xbf") !== FALSE) {
  $data = substr($data, 3);
}

Original comment by ryancour...@gmail.com on 17 Jan 2012 at 6:07

GoogleCodeExporter commented 8 years ago
And if you want to deal with UTF-16 as well, change the above to:

// strip off BOM (UTF-8)
      if (strpos($data, "\xef\xbb\xbf") !== FALSE) {
        $data = substr($data, 3);
      }
      // strip off BOM (LE UTF-16) 
      else if(strpos($data, "\xff\xfe") !== FALSE) {
        $data = substr($data, 2);
      }
      // strip off BOM (BE UTF-16) 
      else if(strpos($data, "\xfe\xff") !== FALSE) {
        $data = substr($data, 2);
      }

Original comment by ryancour...@gmail.com on 20 Sep 2012 at 1:15

GoogleCodeExporter commented 8 years ago
had the same problem. modified method _rfile() to remove the bom and solved the 
issue for me...

    /**
     * Read local file
     * @param   file   local filename
     * @return  Data from file, or false on failure
     */
    function _rfile ($file = null) {
        if ( is_readable($file) ) {
            if ( !($fh = fopen($file, 'r')) ) return false;
            $data = fread($fh, filesize($file));

            // remove bom
            $bom = pack('H*','EFBBBF');
            $data = preg_replace("/^$bom/", '', $data);

            fclose($fh);
            return $data;
        }
        return false;
    }

Original comment by gsi...@gmail.com on 10 Apr 2014 at 3:58