parsecsv / parsecsv-for-php

CSV data parser for PHP.
MIT License
681 stars 176 forks source link

Keep linebreak as \n or <br /> #196

Closed antoine1003 closed 2 months ago

antoine1003 commented 4 years ago

Hi,

I use this tool to easily read csv file. In some files I have cells that contain line breaks :

id;name;description
12:Josh;This name
is pretty cool
13;Molly; Owner of the shop.
Arrived in 2020

But the line breaks are removed. It could be cool to add a feature that allows the user to replace line break (expected final line break) by a given text.

Thanks, Antoine

gogowitsch commented 3 years ago

For line breaks in CSV files you need to enclose the cell content. Most typically, the " character is used.

I don't think that there is code in this library that removes line breaks. Do you have a source code example? Maybe it is just the visual display in the browser or the lack of enclosing characters.

stevleibelt commented 2 years ago

@antoine1003 is the answer provided by @fonata good enough to fix your issue?

This ticket is pretty old already and it would be nice to close it with a remark that the answer from @fonata was a good one.

antoine1003 commented 2 years ago

I'll try to find in my old project 😄

marcoris commented 2 years ago

Man i am searching for a tool that gets my whole linebreak content but it does not work like in your library description showing.

PHP code: $csv = new Csv(); $csv->auto("bla.csv"); $csv->enclose_all = true;

image

you see the issue? after [/freitext] is a linebreak.

CSV content: Typ;Nummer;Obergruppe;Titel;Beschreibung;Einheit;Einkaufspreis(1);Verkaufspreis(2);Spezialpreis(3);Standardpreis(1,2,3);Als Freiposition G;1.11.000;;1/2 gewundene Treppe;;;;;;; ;;;;;;;;;; A;1.11.001;1.11.000;1/4 gewundene Treppe mit Geländer;"[freitext]Wange in ""Fichte 1-Schichtplatte"", oder mit verwachsenen Äste/Ästen 5 cm [/freitext] [freitext]Trittstufen in Eiche 4 cm schlichte Ausführung[/freitext]

jimeh commented 2 years ago

Input fields containing any form of line break, must be enclosed with double quotes. Otherwise it's impossible to reliable tell if the line break denotes a new record, or just a line break in the current field being parsed.

A bit more detail is available in clause 7 of my csv-spec.org project:

7. Fields containing line breaks (CRLF, LF, or CR), double quotes, or the delimiter character (normally a comma) MUST be enclosed in double-quotes.

To use the original example in this issue:

id;name;description
12:Josh;This name
is pretty cool
13;Molly; Owner of the shop.
Arrived in 2020

It must be be changed to this to be parsable as CSV-structured data:

id;name;description
12:Josh;"This name
is pretty cool"
13;Molly;" Owner of the shop.
Arrived in 2020"

Which should then parse to the following structure if represented as JSON:

[
  [
    "id",
    "name",
    "description"
  ],
  [
    "12",
    "Josh",
    "This name\nis pretty cool"
  ],
  [
    "13",
    "Molly",
    " Owner of the shop.\nArrived in 2020"
  ]
]

And if using the first record as a headers record, the above structure can be normalized to the following:

[
  {
    "id": "12",
    "name": "Josh",
    "description": "This name\nis pretty cool"
  },
  {
    "id": "13",
    "name": "Molly",
    "description": " Owner of the shop.\nArrived in 2020"
  }
]