nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.77k stars 629 forks source link

splitCsv includes the byte order mark as part of the first column of the first line. #3596

Open sclamons opened 1 year ago

sclamons commented 1 year ago

Bug report

Expected behavior and actual behavior

Expected: When Nextflow reads a CSV file that includes a byte order mark using splitCSV, the byte order mark should be removed. Byte order marks are included in CSV files produced by Excel when saved as a "CSV UTF-8" file.

Actual: Nextflow includes the byte order mark as (invisible) text in the first row.

Steps to reproduce the problem

Save the following CSV text using Excel, in CSV UTF-8 format, as "my_csv.csv":

param_1,param_2 val_1,val_2

Nextflow file:

nextflow.enable.dsl = 2

workflow {
  Channel.fromPath('my_csv.csv') | splitCsv(header:true, strip:true) | view {"row contains parameter 'param_1'?: ${it.containsKey('param_1')}; row contains parameter 'param_2'?: ${it.containsKey('param_2')}"}
}

Expected output: "row contains parameter 'param_1'?: true; row contains parameter 'param_2'?: true"

Actual output: "row contains parameter 'param_1'?: false; row contains parameter 'param_2'?: true"

Program output

N E X T F L O W ~ version 21.10.6 Launching bug_test_main.nf [serene_sax] - revision: 281274f014 row contains parameter 'param_1'?: false; row contains parameter 'param_2'?: true

Environment

Additional context

nextflow.log

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.