pachterlab / seqspec

machine-readable file format for genomic library sequence and structure
MIT License
114 stars 17 forks source link

missing support for custom read primer definition #24

Closed bnovak32 closed 1 year ago

bnovak32 commented 1 year ago

There is currently no support for custom read primers. Specifically, at least two assays that I am aware of (BioRad SureCell 3' WTA and ATACseq) use a custom read1 primer rather than the standard Illumina TruSeq or Nextera primers. The only currently supported region types that appear to indicate primers are: truseq_read1/truseq_read2 and nextera_read1/nextera_read2. In order to incorporate seqspec into an automated pipeline which include adapter trimming (for example), it would need to support designation of custom primer types as well (such as the more generic "read1_primer" and "read2_primer" designation currently used in the SureCell seqspec).

sbooeshaghi commented 1 year ago

Do you have a suggestion for what to name such a region? What about "custom_primer"?

bnovak32 commented 1 year ago

That would work for the BioRad SureCell data, since there's only one custom read primer (for read1). However, Illumina instruments allow custom primers for all reads (including index reads), so its probably a good idea to include custom versions of all primers for future proofing (at minimum, "custom_read1_primer" and "custom_read2_primer".

sbooeshaghi commented 1 year ago

It may make sense to have a custom_primer region_type and then use the region_id to specify that the primer is specific to "read1" and "read2" as those are often sequencer and library structure specific- would that work for your use case?

sbooeshaghi commented 1 year ago

Ping

bnovak32 commented 1 year ago

Yes, I think that would work fine.

sbooeshaghi commented 1 year ago

Ok- I've added it in the most recent commit.

https://github.com/IGVF/seqspec/commit/5f17ffcfb11aef35d43166f16488164f15f13451