Closed jfy133 closed 1 month ago
I had the same issue in taxprofiler
. I would like to define the these columns are unique: fastq_1
, fastq_2
, fasta
, and the combination of sample
and run_accession
.
schema_input.json: schema_input.json
samplesheet.csv with duplicates: samplesheet.csv
Error message:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* --input (https://raw.githubusercontent.com/nf-core/test-datasets/taxprofiler/samplesheet.csv): Validation of file failed:
-> Entry 7: Detected non-unique combination of the following fields: [fastq_1]
-> Entry 4: Detected non-unique combination of the following fields: [fastq_2]
-> Entry 3: Detected non-unique combination of the following fields: [fasta]
-> Value does not match against the schemas at indexes [0, 1, 2]
Done!
We noticed that the functionality provided by the removed of
items:
property does not actually get replaced by theuniqueItems
anduniqueEntries
fields.items:
allowed for a single 'column' of a TSV to be validated that everything was unique, independent of any other column.uniqueItems
: forces all columns to be unique, not particular onesuniqueEntries
requires specific combinations of columns to be unique together (not a single column).We can almost get around this with placing a
allOf
field at the top level of the schema, and listing each column that should be independently validated withuniqueEntries
, such as:However if any of those columns violate uniqueness, the error message reports all of the columns being non-unique in independent errors (and still saying 'combinations'
Example schema:
Example broken csv:
Where there is one duplicate in
id
and one duplicate infasta_aa