humlab-sead / quality_control

Descriptions and documentation of data and system quality control for SEAD
1 stars 0 forks source link

Line feeds in site_description cause export problems #4

Open visead opened 4 years ago

visead commented 4 years ago

Problem A number of sites have Line Feed (LF) characters within their description texts. These cause problems if a use exports data as csv and then imports into Excel. The line feeds cause subsequent rows to be overwritten in Excel, and sites are lost from the imported list.

Cause Possibly data copied and pasted from other sources into SEAD, or entered via our macrofossil Excel template

Solution Global search and replace of LF with ?

roger-mahler commented 1 year ago

Problem affects 438 sites:

select *
from tbl_sites
where site_description like '%'||chr(13)||'%'
      or site_description like '%'||chr(10)||'%'

This can be corrected with this update:

update tbl_sites
    set site_description = regexp_replace(site_description, '[\r\n]', ' ', 'g')
where site_description != regexp_replace(site_description, '[\r\n]', ' ', 'g')

But some site descriptions is rather long, and intentional formatting and indentation in the description would be lost. These description would probably be rendered worse when displayed in a GUI.

Alternative solutions would be:

  1. Replace CR & LF using replacement above (regexp_replace(site_description, '[\r\n]', ' ', 'g')) when exporting data to CSV.
  2. Use quoting of text when exporting data. (preferred solution). This is the best and safest way, since it also handles occurrences of delimiters in text. You can then specify quoting character (usually ") when importing to Excel.

Enable quoting in pgAdmin:

image