pachterlab / seqspec

machine-readable file format for genomic library sequence and structure
MIT License
112 stars 17 forks source link

index string should take into account strand information #40

Closed sidwekhande closed 2 months ago

sidwekhande commented 6 months ago

This affects chromap's read_format, but could affect other formats for other tools as well. The chromap index string needs to take into account strand information, and add a "-" at the end of the string when the strand is neg.

For example, for this seqspec, the correct index string to align the fastqs using chromap is bc:8:23:-,r1:0:49,r2:0:49. Current implementation will return bc:8:23,r1:0:49,r2:0:49 (missing the extra ":-" at end of bc string).

sbooeshaghi commented 5 months ago

Can you write the seqspec command you are using to generate the chromap sting?

sidwekhande commented 5 months ago

Yes, the command is: seqspec index -m atac -r ATAC_1_S358_L007_R1_001.fastq.gz,ATAC_1_S358_L007_R3_001.fastq.gz,ATAC_1_S358_L007_R2_001.fastq.gz -t chromap Team_1_CharacterizationMcGinnis_Dataset1_10X_Lane_1_seqspec.yaml

sbooeshaghi commented 2 months ago

This has been fixed in https://github.com/pachterlab/seqspec/commit/e1ae2eb7e3fe18a3e1b73c4f97dc803d68e76a69. Please install devel, test, and reopen if not fixed.