ratschlab / spladder

Tool for the detection and quantification of alternative splicing events from RNA-Seq data.
Other
103 stars 33 forks source link

Testing SplAdder with *fake* exon skip #89

Closed jfass closed 2 years ago

jfass commented 5 years ago

Description

I've had trouble finding "ground truth" test cases of clear splice alterations that I can test quickly, so I decided to spoof an exon skip by adding a fake exon in the middle of an existing intron.

What I Did

I created a fake "exon 2" in the first intron of one of the NCAM2 transcripts from chr21 (the second of these three GTF lines) and renumbered subsequent exons:

chr21   HAVANA  CDS     22370882        22370936        .       +       0       gene_id "ENSG00000154654.15_4"; transcript_id "ENST00000400546.6_3"; gene_type "protein_coding"; gene_name "NCAM2"; transcript_type "protein_coding"; transcript_name "NCAM2-202"; exon_number 1; exon_id "ENSE00001543472.2"; level 2; protein_id "ENSP00000383392.1"; transcript_support_level 1; tag "basic"; tag "MANE_Select"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS42910.1"; havana_gene "OTTHUMG00000078121.4_4"; havana_transcript "OTTHUMT00000170915.2_3"; remap_original_location "chr21:+:20998564-20998618"; remap_status "full_contig";
chr21   HAVANA  CDS     22615000        22615099        .       +       0       gene_id "ENSG00000154654.15_4"; transcript_id "ENST00000400546.6_3"; gene_type "protein_coding"; gene_name "NCAM2"; transcript_type "protein_coding"; transcript_name "NCAM2-202"; exon_number 2; exon_id "ENSE90003496361.1"; level 2; protein_id "ENSP00000383392.1"; transcript_support_level 1; tag "basic"; tag "MANE_Select"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS42910.1"; havana_gene "OTTHUMG00000078121.4_4"; havana_transcript "OTTHUMT00000170915.2_3"; remap_original_location "chr21:+:21280578-21280652"; remap_status "full_contig";
chr21   HAVANA  CDS     22652898        22652972        .       +       2       gene_id "ENSG00000154654.15_4"; transcript_id "ENST00000400546.6_3"; gene_type "protein_coding"; gene_name "NCAM2"; transcript_type "protein_coding"; transcript_name "NCAM2-202"; exon_number 3; exon_id "ENSE00003496361.1"; level 2; protein_id "ENSP00000383392.1"; transcript_support_level 1; tag "basic"; tag "MANE_Select"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS42910.1"; havana_gene "OTTHUMG00000078121.4_4"; havana_transcript "OTTHUMT00000170915.2_3"; remap_original_location "chr21:+:21280578-21280652"; remap_status "full_contig";

Then I ran:

spladder build --parallel 16 --bams test.bam -c 0 \
  --annotation gencode.v30lift37.basic.annotation.chr21modNCAM2.gtf --outdir test.spladder --set-mm-tag nM --readlen 150 \
  1> test.spladder.out 2> test.spladder.err &

... which completed in some number of minutes (since the annotation only details genes on chr21). SplAdder reported a handful of exon skip events on chr21, but not in the NCAM2 gene, even though ~42 reads are spliced across the real 1st exon, skipping the fake exon from above.

Can you tell me if I'm doing anything problematic, here? Shouldn't this generate an exon skip event? What am I missing? I can't provide data for this case, but I could replicate this on a publicly available dataset if you want to have a test case.

Thanks, ~Joe

jfass commented 5 years ago

The only thing I'm seeing is the following warnings in stderr:

/home/ubuntu/Spladder/spladder-venv/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/ubuntu/Spladder/spladder-venv/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

... but, there are plenty of other events being generated. Just not in the gene I messed with.

akahles commented 5 years ago

Hi Joe,

Thanks for reporting. Would you mind sharing the test.bam with me?

Best,

Andre

jfass commented 5 years ago

Hi Andre,

Here's a link to the data (two bams from samples in GEO, referenced in the SpliceDetector paper, so they're public, and the GTF modified by me to have an extra exon -- search for "Joe" in the gtf file). Let me know if you have any problems grabbing it; I think I set up the S3 bucket for full public access, but haven't tested that.

Many thanks for taking a look, ~Joe

Edit: heh ... forgot to make it public; also fixed the link

akahles commented 5 years ago

Hi Joe, I finally was able to look at the data you sent. It is my understanding that you only altered the input gtf file. So the bam file(s) do not contain any evidence for the exon you added? In this case, SplAdder will not report an event, as by default only events are reported that have confirmation in at least one of the given RNA-Seq samples. If you would like to see the coordinates of all events that could be extracted from the graph, you can force SplAdder to output those as well (using --output-gff3). This file indeed contains your fake event:

chr21   exon_skip       gene    26976108        26978967        .       -       .       ID=exon_skip.329;GeneName="ENSG00000154719.13_5"
chr21   exon_skip       mRNA    26976108        26978967        .       -       .       ID=exon_skip.329_iso1;Parent=exon_skip.329;GeneName="ENSG00000154719.13_5"
chr21   exon_skip       exon    26976108        26976247        .       -       .       Parent=exon_skip.329_iso1
chr21   exon_skip       exon    26978761        26978967        .       -       .       Parent=exon_skip.329_iso1
chr21   exon_skip       mRNA    26976108        26978967        .       -       .       ID=exon_skip.329_iso2;Parent=exon_skip.329;GeneName="ENSG00000154719.13_5"
chr21   exon_skip       exon    26976108        26976247        .       -       .       Parent=exon_skip.329_iso2
chr21   exon_skip       exon    26977250        26977447        .       -       .       Parent=exon_skip.329_iso2
chr21   exon_skip       exon    26978761        26978967        .       -       .       Parent=exon_skip.329_iso2

If you need it, I can add an option, to also go on and quantify all events (and not only the ones confirmed in at least on input sample).

Cheers, Andre

jfass commented 5 years ago

Hi Andre,

In that test case, since I added a fake exon, every read that splices across the two real flanking exons should be evidence for the exon skip, no?

~Joe

On Fri, Aug 9, 2019, 4:33 AM Andre Kahles notifications@github.com wrote:

Hi Joe, I finally was able to look at the data you sent. It is my understanding that you only altered the input gtf file. So the bam file(s) do not contain any evidence for the exon you added? In this case, SplAdder will not report an event, as by default only events are reported that have confirmation in at least one of the given RNA-Seq samples. If you would like to see the coordinates of all events that could be extracted from the graph, you can force SplAdder to output those as well (using --output-gff3). This file indeed contains your fake event:

chr21 exon_skip gene 26976108 26978967 . - . ID=exon_skip.329;GeneName="ENSG00000154719.13_5" chr21 exon_skip mRNA 26976108 26978967 . - . ID=exon_skip.329_iso1;Parent=exon_skip.329;GeneName="ENSG00000154719.13_5" chr21 exon_skip exon 26976108 26976247 . - . Parent=exon_skip.329_iso1 chr21 exon_skip exon 26978761 26978967 . - . Parent=exon_skip.329_iso1 chr21 exon_skip mRNA 26976108 26978967 . - . ID=exon_skip.329_iso2;Parent=exon_skip.329;GeneName="ENSG00000154719.13_5" chr21 exon_skip exon 26976108 26976247 . - . Parent=exon_skip.329_iso2 chr21 exon_skip exon 26977250 26977447 . - . Parent=exon_skip.329_iso2 chr21 exon_skip exon 26978761 26978967 . - . Parent=exon_skip.329_iso2

If you need it, I can add an option, to also go on and quantify all events (and not only the ones confirmed in at least on input sample).

Cheers, Andre

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ratschlab/spladder/issues/89?email_source=notifications&email_token=AAEILKFGTY3FVBSRHULVLR3QDVI2NA5CNFSM4H2US372YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36NHVI#issuecomment-519885781, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEILKCNPQI33WJYYJTHHN3QDVI2NANCNFSM4H2US37Q .

akahles commented 5 years ago

Sure, but an event will only be reported if both paths in the event happen in at least one of the input bam files. There is no evidence for the inclusion of the exon in the bam files and hence no event is reported.

Cheers, Andre

jfass commented 5 years ago

Ah, so I assumed that SplAdder was building from- and then comparing to the canonical (GTF/GFF) annotation. But I can extract those events from the graph with that --output-gff3 option? What would be the new option, then? Does --output-gff3 output the entire graph, and so I would need to parse out the non-canonical lines (which is easy if they all have the appropriate event name in column 2)? So, if you added a new option, it would omit all of the graph paths that exist in the input GTF/GFF?

So this solves this artificial test case. Will you have time to look at the more realistic case I emailed you about -- the EGFRvIII one?

Thanks Andre; I appreciate the time you've put into this, ~Joe

On Fri, Aug 9, 2019 at 7:42 AM Andre Kahles notifications@github.com wrote:

Sure, but an event will only be reported if both paths in the event happen in at least one of the input bam files. There is no evidence for the inclusion of the exon in the bam files and hence no event is reported.

Cheers, Andre

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ratschlab/spladder/issues/89?email_source=notifications&email_token=AAEILKCCMANKOKBVKO6VXBLQDV635A5CNFSM4H2US372YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36327Q#issuecomment-519945598, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEILKE2K2QU6P2BJOHNZCLQDV635ANCNFSM4H2US37Q .

akahles commented 5 years ago

Yes, the option --output-gff3 will report all events that are found in the graph. However, the events will not get quantified (as in our use-cases so fare we never needed events that had only evidence for one of the 2 possible paths in all samples). I could think of a use case though, where you would count annotated paths as evidence and then quantify the events nevertheless. So what I was suggesting to also allow for quantification of events "unconfirmed" by the read data.

Regarding your other question, I will reply via e-mail.

Best, Andre

jfass commented 5 years ago

Gotcha -- yes! that seems like it would be useful to us. ~Joe

On Fri, Aug 9, 2019 at 9:32 AM Andre Kahles notifications@github.com wrote:

Yes, the option --output-gff3 will report all events that are found in the graph. However, the events will not get quantified (as in our use-cases so fare we never needed events that had only evidence for one of the 2 possible paths in all samples). I could think of a use case though, where you would count annotated paths as evidence and then quantify the events nevertheless. So what I was suggesting to also allow for quantification of events "unconfirmed" by the read data.

Regarding your other question, I will reply via e-mail.

Best, Andre

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ratschlab/spladder/issues/89?email_source=notifications&email_token=AAEILKHMS3PL6XJMGGMLQXLQDWLZTA5CNFSM4H2US372YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37FAFQ#issuecomment-519983126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEILKHUHQAVIFRSCWWXMMTQDWLZTANCNFSM4H2US37Q .

mhjiang97 commented 4 years ago

Gotcha -- yes! that seems like it would be useful to us. ~Joe

Hi Joe,

I am confused about how to figure out if an alternative splicing in the spladder output files is an annotated one in user-provided gtf. Do you have any idea about this question? If I cannot directly know which alternative splicing event is already annotated from the spladder output files, what should I do to solve it? Thank you very much! I'm looking forward to your reply~

Best, Minghao

akahles commented 2 years ago

Closing this, as the original issue has been resolved. Also, the labelling whether an event is annotated has now been added to the output in the latest release. Please open a new issue, if anything comes up using the latest release. Best, Andre