repseqio / library-imgt

IMGT segment library converted to RepSeq.IO JSON format
12 stars 5 forks source link

L-Part? #6

Closed bbimber closed 2 years ago

bbimber commented 6 years ago

Hello,

If i understand these rules correctly, the V genes do not return L-PART, since the 7.14 query used by at least human and macaque relies on the padded FASTA that doesnt include L1/L2. This type of URL will return the coordinates for L-REGION:

http://www.imgt.org/genedb/GENElect?query=8.1+TRAV&species=Macaca+mulatta&IMGTlabel=L-PART1

is there a way to pull from multiple URL sources to generate a library?

dbolotin commented 6 years ago

I love IMGT 🤦‍♂️... I'll look into it.

bbimber commented 6 years ago

Thanks. Two other questions/thoughts: I wrote a quick parser to convert the IMGT flatfiles to repseqio libraries. This is a richer source than the padded FASTAs. When doing the this, I based the sequence off the refseq source, rather than a local FASTA. this is a little more robust than the local FASTA since you also have flanking genomic data. I dont know how many species this applies to, but assuming IMGT is moderately consistent I think the parser could be relatively general purpose. I did need to convert a little from IMGT's anchor point definitions to MiXCR; however, that wasnt too difficult.