pmelsted / pizzly

Fast fusion detection using kallisto
BSD 2-Clause "Simplified" License
80 stars 10 forks source link

pizzly cache format #21

Closed roryk closed 6 years ago

roryk commented 6 years ago

What is the meaning of the columns in the pizzly cache?

For genes:

GENE    ENSG00000277475.1   AC213203.1  KI270713.1  PROTEIN -   31697   32528   ENST00000612315.1
GENE gene-name name1? name2? other/pseudo/protein strand start? stop? transcript-list

For transcripts:

TRANSCRIPT  ENST00000529488.5   ENSG00000254951.7   11  OTHER   -   7754392 7882982 7882859,7882982;7881955,7882073;7881088,7881182;7879594,7879713;7879006,7879083;7754392,7754521
TRANSCRIPT transcript, gene it is in, ?, type(OTHER/PSEUDO/PROTEIN), strand, start, stop, exon locations (0 based)

what is the 4th column (the 11)?

I'm trying to make the index creation a little less strict to support other GTF types.

roryk commented 6 years ago

Answered my own question looking at the code.