pachterlab / seqspec

machine-readable file format for genomic library sequence and structure
MIT License
114 stars 17 forks source link

seqspec check too strict for corner case #51

Closed hitz closed 2 weeks ago

hitz commented 1 month ago

@sidwekhande has an example seqspec: e.g. (https://api.data.igvf.org/configuration-files/IGVFFI0714JZHN/@@download/IGVFFI0714JZHN.yaml.gz but auth required) where the fastq has be demuxed fastqs, our and do not have a truseq_read1 and truseq_read2. However, seqspec starts labeling the fastq sequence starting from the truseq region.

So thesr are submitted truseq regions with the length set to 0, and sequence as null. This triggers:

error 8] None is not of type 'string' in spec['library_spec'][0]['regions'][0]['sequence'] [error 9] None is not of type 'string' in spec['library_spec'][0]['regions'][2]['sequence']

Proposed fix: seqspec check should account for len=0 and ignore this error.

sbooeshaghi commented 1 month ago

Can you post the seqspec file here so that I can test it? I thought I had fixed this previously..

hitz commented 1 month ago

There are other errors as well: IGVFFI0714JZHN-upgrade3.0.yaml.txt

sbooeshaghi commented 1 month ago

The sequence parameter can be set to '' and the error will go away. This check is already embedded in the spec.schema.json

https://github.com/pachterlab/seqspec/blob/289e7e3228a6c35289e82d1d760ca6c84610f16d/seqspec/schema/seqspec.schema.json#L391-L411

hitz commented 2 weeks ago

so you mean change sequence: null to sequence '', right? We will test this.

hitz commented 2 weeks ago

worked