huangyh09 / brie

BRIE: Bayesian Regression for Isoform Estimate in Single Cells
https://brie.readthedocs.io
Apache License 2.0
41 stars 15 forks source link

BRIE does not support annotations other than GENCODE #2

Closed jenni-westoby closed 7 years ago

jenni-westoby commented 7 years ago

The following error occurs when executing brie-event-filter using the Ensembl GTF file from ftp://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/Mus_musculus.GRCm38.82.chr.gtf.gz:

 $ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=GRCm38.p5.genome.fa
[fai_load] build FASTA index.
9908 Skipped Exon events are input for quality check.
0 Skipped Exon events pass the qulity control.
Traceback (most recent call last):
  File "venv/bin/brie-event-filter", line 9, in <module>
    load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 361, in main
    g_idx, g_chr, g_start, g_stop = get_gene_idx(anno_out)
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 39, in get_gene_idx
    g_idx.append([now_g, last_g])
UnboundLocalError: local variable 'last_g' referenced before assignment

The following command was used to generate SE.gff3:

$ brie-event -a Mus_musculus.GRCm38.82.chr.gtf -o  AS_events
Making GFF alternative events annotation...
  - Input annotation files: Mus_musculus.GRCm38.82.chr.gtf
  - Output dir: AS_events
('Reading table', 'Mus_musculus.GRCm38.82.chr.gtf')
Generating skipped exons (SE)
Generating retained introns (RI)
Generating mutually exclusive exons (MXE)
Generating alternative 3' splice sites (A3SS)
Generating alternative 5' splice sites (A5SS)
Took 0.69 minutes to make the annotation.

This error does not occur if the GENCODE GTF file from ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/gencode.vM12.annotation.gtf.gz is used to generate SE.gff3 and SE.gold.gff3 instead.

huangyh09 commented 7 years ago

Thanks for using BRIE. This issue happens because the GTF file from Ensembl and the fasta file from GENCODE have different format of chromosome ids, i.e., the GTF has "1" while the FASTA has "chr1". Now I upgrade the brie-event-filter file to support the different chrom ids. The output chrom ids will be the format of FASTA, which is probably the same as your sam/bam file.

The new brie-event-filter is available in this GitHub repo now, but not in the new release in pypi yet.

Yuanhua

jenni-westoby commented 7 years ago

Thank you, the new brie-event-filter supports the Ensembl GTF file.