gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

Number of samples reported exceeds the maximum #64

Closed Akazhiel closed 2 years ago

Akazhiel commented 3 years ago

I've come across what could possibly be a bug. When running gffcompare to merge multiple GTF files using a reference file, the number of samples reported sometimes greatly exceeds the max number of samples I'm working with which is 60, and when I take a look at the tracking file, as it's to be expected, the sample number is within the expected range. There may be a problem on how gffcompare counts the q occurrences and reports it.

gpertea commented 3 years ago

Can you clarify where exactly (which output of gffcompare?) has that "number of samples reported" that greatly exceeds the max number of samples you have?

Akazhiel commented 3 years ago

It’s the gffcmp.combined.gtf. It has a field called “num_samples”, which is reporting more samples than I am working with.

gpertea commented 2 years ago

This only happens when duplicate/redundant (matching) transfrags exist in a sample GTF.

Unfortunately even StringTie seems to produce such structurally redundant transfrags sometimes -- those can happen for isoforms with the same intron chain (or single-exon transfrags) but with different TSS/TES as suggested by read coverage.