pachterlab / seqspec

machine-readable file format for genomic library sequence and structure
MIT License
114 stars 17 forks source link

Draft4Validator.iter_errors is expecting a dictionary #6

Closed detrout closed 1 year ago

detrout commented 1 year ago

When running seqspec check

The result is:

python3 -m seqspec.main check ./assays/BD-Rhapsody-EB/spec.yaml                        
[error 1] {'name': 'BD-Rhapsody-EB', 'doi': 'https://scomix.bd.com/hc/en-us/articles/6990647359501-Rhapsody-WTA-De
mo-Datasets-with-Enhanced-Cell-Capture-Beads', 'publication_date': '31 August 2022', 'description': 'BD Rhapsody W
TA is a nanowell-based commercial system that uses a split-pool (Enahnced Beads-v2) approach to generate oligos on
 magnetic beads.', 'modalities': ['RNA'], 'lib_struct': 'https://teichlab.github.io/scg_lib_structs/methods_html/B
D_Rhapsody.html', 'assay_spec': [{'region_id': 'RNA', 'region_type': 'RNA', 'name': 'RNA', 'sequence_type': 'joine
d', 'onlist': None, 'sequence': 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTXNNNNNNNNNGTGANNNNNNNNN
GACANNNNNNNNNNNNNNNNNXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG', 'min_len': 169, 'max_l
en': 366, 'regions': [{'region_id': 'illumina_p7', 'region_type': 'illumina_p7', 'name': 'illumina_p7', 'sequence_
type': 'fixed', 'onlist': None, 'sequence': 'AATGATACGGCGACCACCGAGATCTACAC', 'min_len': 29, 'max_len': 29, 'region
s': None}, {'region_id': 'truseq_r1', 'region_type': 'truseq_r1', 'name': 'truseq_r1', 'sequence_type': 'fixed', '
onlist': None, 'sequence': 'TCTTTCCCTACACGACGCTCTTCCGATCT', 'min_len': 29, 'max_len': 29, 'regions': None}, {'regi
on_id': 'vb', 'region_type': 'vb', 'name': 'vb', 'sequence_type': 'onlist', 'onlist': {'filename': 'vb_onlist.txt'
, 'md5': None}, 'sequence': 'X', 'min_len': 0, 'max_len': 3, 'regions': None}, {'region_id': 'cls1', 'region_type'
: 'cls1', 'name': 'cls1', 'sequence_type': 'onlist', 'onlist': {'filename': 'cls1_onlist.txt', 'md5': None}, 'sequ
ence': 'NNNNNNNNN', 'min_len': 9, 'max_len': 9, 'regions': None}, {'region_id': 'linker1', 'region_type': 'linker1
', 'name': 'linker1', 'sequence_type': 'fixed', 'onlist': None, 'sequence': 'GTGA', 'min_len': 4, 'max_len': 4, 'r
egions': None}, {'region_id': 'cls2', 'region_type': 'cls2', 'name': 'cls2', 'sequence_type': 'onlist', 'onlist': 
{'filename': 'cls2_onlist.txt', 'md5': None}, 'sequence': 'NNNNNNNNN', 'min_len': 9, 'max_len': 9, 'regions': None
}, {'region_id': 'linker2', 'region_type': 'linker2', 'name': 'linker2', 'sequence_type': 'fixed', 'onlist': None,
 'sequence': 'GACA', 'min_len': 4, 'max_len': 4, 'regions': None}, {'region_id': 'cls3', 'region_type': 'cls3', 'n
ame': 'cls3', 'sequence_type': 'onlist', 'onlist': {'filename': 'cls3_onlist.txt', 'md5': None}, 'sequence': 'NNNN
NNNNN', 'min_len': 9, 'max_len': 9, 'regions': None}, {'region_id': 'umi', 'region_type': 'umi', 'name': 'umi', 's
equence_type': 'random', 'onlist': None, 'sequence': 'NNNNNNNN', 'min_len': 8, 'max_len': 8, 'regions': None}, {'r
egion_id': 'polyT', 'region_type': 'polyT', 'name': 'polyT', 'sequence_type': 'random', 'onlist': None, 'sequence'
: 'X', 'min_len': 1, 'max_len': 98, 'regions': None}, {'region_id': 'cdna', 'region_type': 'cdna', 'name': 'cdna',
 'sequence_type': 'random', 'onlist': None, 'sequence': 'X', 'min_len': 1, 'max_len': 98, 'regions': None}, {'regi
on_id': 'truseq_r2', 'region_type': 'truseq_r2', 'name': 'truseq_r2', 'sequence_type': 'fixed', 'onlist': None, 's
equence': 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC', 'min_len': 34, 'max_len': 34, 'regions': None}, {'region_id': 'sam
ple_index', 'region_type': 'sample_index', 'name': 'sample_index', 'sequence_type': 'onlist', 'onlist': {'filename
': 'sample_index_onlist.txt', 'md5': None}, 'sequence': 'NNNNNNNN', 'min_len': 8, 'max_len': 8, 'regions': None}, 
{'region_id': 'illumina_p7', 'region_type': 'illumina_p7', 'name': 'illumina_p7', 'sequence_type': 'fixed', 'onlis
t': None, 'sequence': 'ATCTCGTATGCCGTCTTCTGCTTG', 'min_len': 24, 'max_len': 24, 'regions': None}]}]} is not of typ
e 'object' in spec[]

after applying this patch the error messages look quite a bit more plausible.


--- a/seqspec/seqspec_check.py
+++ b/seqspec/seqspec_check.py
@@ -39,9 +39,8 @@ def validate_check_args(parser, args):

 def run_check(schema, spec):
-
     v = Draft4Validator(schema)
-    for idx, error in enumerate(v.iter_errors(spec), 1):
+    for idx, error in enumerate(v.iter_errors(spec.to_dict()), 1):
         print(
             f"[error {idx}] {error.message} in spec[{']['.join(repr(index) for index in error.path)}]"
         )

Now lists many more errors.

Though also maybe some of the attributes could be optional?

As a guess order might be a good candidate for either being optional, having validation code added, or having the order of elements in the list shuffled to match the order. (I bet the Stanford DACC might be able to help with the jsonschema)

[error 1] 'order' is a required property in spec['assay_spec'][0]['regions'][0]
[error 2] 'order' is a required property in spec['assay_spec'][0]['regions'][1]
[error 3] None is not of type 'string' in spec['assay_spec'][0]['regions'][2]['onlist']['md5']
[error 4] 'order' is a required property in spec['assay_spec'][0]['regions'][2]
[error 5] None is not of type 'string' in spec['assay_spec'][0]['regions'][3]['onlist']['md5']
[error 6] 'order' is a required property in spec['assay_spec'][0]['regions'][3]
[error 7] 'order' is a required property in spec['assay_spec'][0]['regions'][4]
[error 8] None is not of type 'string' in spec['assay_spec'][0]['regions'][5]['onlist']['md5']
[error 9] 'order' is a required property in spec['assay_spec'][0]['regions'][5]
[error 10] 'order' is a required property in spec['assay_spec'][0]['regions'][6]
[error 11] None is not of type 'string' in spec['assay_spec'][0]['regions'][7]['onlist']['md5']
[error 12] 'order' is a required property in spec['assay_spec'][0]['regions'][7]
[error 13] 'order' is a required property in spec['assay_spec'][0]['regions'][8]
[error 14] 'order' is a required property in spec['assay_spec'][0]['regions'][9]
[error 15] 'order' is a required property in spec['assay_spec'][0]['regions'][10]
[error 16] 'order' is a required property in spec['assay_spec'][0]['regions'][11]
[error 17] None is not of type 'string' in spec['assay_spec'][0]['regions'][12]['onlist']['md5']
[error 18] 'order' is a required property in spec['assay_spec'][0]['regions'][12]
[error 19] 'order' is a required property in spec['assay_spec'][0]['regions'][13]
[error 20] 'order' is a required property in spec['assay_spec'][0]
sbooeshaghi commented 1 year ago

This has been fixed with https://github.com/IGVF/seqspec/pull/4!