qqwang-berkeley / JUM

A tool for annotation-free differential analysis of tissue-specific pre-mRNA alternative splicing patterns
MIT License
27 stars 11 forks source link

Uninitialized value errors when running JUM_C.sh #30

Open lindayqlin opened 4 years ago

lindayqlin commented 4 years ago

Hello,

I am encountering a lot of uninitialized value errors when running JUM_C.sh and am not sure what is wrong. These include:

I am trying to use JUM to identify IR events. More than half of the identified IR events have gene="NONE" or even nothing (empty string?) in the final IR output. This behavior is not exactly expected because IR events should occur within genes, and some of these locations do map onto genes when inspected. I am guessing that these issues might be related to each other?

I tried to follow all the directions in the manual for v2.0.2 and ran the scripts as in the examples provided. I did not encounter any errors until this step, and I believe my refFlat file matches the format specifications. Would you be able to help me troubleshoot this please?

Thank you very much in advance!

qqwang-berkeley commented 4 years ago

Hi Linda,

Would you send me the first 50 lines of your refFlat annotation file?

Qingqing

On Tue, May 5, 2020 at 1:56 PM Linda Lin notifications@github.com wrote:

Hello,

I am encountering a lot of uninitialized value errors when running JUM_C.sh and am not sure what is wrong. These include:

  • in print at /DATA5/JUM/JUM_2.0.2/identify_gene_name_for_JUM_output_3.pl line 45, line 1278 on
  • Lots of things in /media/nkaplin1/DATA5/JUM/JUM_2.0.2/final_process_MXE_output.pl lines 191-262, line 1
  • $junctionID[10] in pattern match (m//) at /DATA5/JUM/JUM_2.0.2/identify_gene_name_for_JUM_output_1.pl line 50, line 428/435
  • in string eq at /DATA5/JUM/JUM_2.0.2/identify_gene_name_for_JUM_output_1.pl line 53, line 428/435
  • in print at /DATA5/JUM/JUM_2.0.2/identify_gene_name_for_JUM_output_2.pl line 60
  • in print at /DATA5/JUM/JUM_2.0.2/identify_gene_name_for_JUM_output_3.pl line 45, line 2 on

I am trying to use JUM to identify IR events. More than half of the identified IR events have gene="NONE" or even nothing (empty string?) in the final IR output. This behavior is not exactly expected because IR events should occur within genes, and some of these locations do map onto genes when inspected. I am guessing that these issues might be related to each other?

Another potential concern is that JUM is identifying fewer IR events than perhaps expected when looking at the data. However, I'm aware that it may have to do with the parameters I used (following the examples/suggestions). Do you know if this could be due to the uninitialized value errors I encountered as well, or is it only due to the parameters chosen?

I tried to follow all the directions in the manual for v2.0.2 and ran the scripts as in the examples provided. I did not encounter any errors until this step, and I believe my refFlat file matches the format specifications. Would you be able to help me troubleshoot this please?

Thank you very much in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/30, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ6PWYYBABAGLMMMJQWJJLRQBHMLANCNFSM4MZY73LA .

lindayqlin commented 4 years ago

Hi Qingqing,

Thank you for the reply! The first 50 lines are attached:

refFlat.txt

qqwang-berkeley commented 4 years ago

If you don't mind, could you send me the input files and the command you used to run JUM_C.sh? I suspect it is some format issue in the input files and would like to run it on my end to debug. The file size should not be that large.

On Mon, May 11, 2020 at 5:23 PM Linda Lin notifications@github.com wrote:

Hi Qingqing,

Thank you for the reply! The first 50 lines are attached:

refFlat.txt https://github.com/qqwang-berkeley/JUM/files/4612191/refFlat.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/30#issuecomment-626972661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ6PWZHOQJ2FIB47SFXU4DRRBUFDANCNFSM4MZY73LA .

lindayqlin commented 4 years ago

Hi Qingqing,

I see. Sure -- are these the input files that you need? input_files.zip

I ran JUM_C.sh using the command nohup bash /DATA5/JUM/JUM_2.0.2/JUM_C.sh --Folder /DATA5/JUM/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalCondition1FileNum 3 --TotalCondition2FileNum 3 --REF /DATA5/linda_JUM/Araport11_genePred_refFlat_formatted.txt > JUM_C.out & from inside JUM_diff/FINAL_JUM_OUTPUT_pvalue_0.05/

Thank you for your help!

qqwang-berkeley commented 4 years ago

Sorry for the late reply - was swamped the last two weeks.

The reason you were seeing this error is because when you ran JUM previously all the chromosome names are without "chr" (check all the output file under column "sub_junction_chr"), but in your reference file refFlat.txt all the chromosome names are with "chr". So the script can not find a match in chromosome names. You can simply change the chromosome name in your refFlat.txt file and it should be fixed.

On Wed, May 13, 2020 at 11:07 PM Linda Lin notifications@github.com wrote:

Hi Qingqing,

I see. Sure -- are these the input files that you need? input_files.zip https://github.com/qqwang-berkeley/JUM/files/4625796/input_files.zip

I ran JUM_C.sh using the command nohup bash /DATA5/JUM/JUM_2.0.2/JUM_C.sh --Folder /DATA5/JUM/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalCondition1FileNum 3 --TotalCondition2FileNum 3 --REF /DATA5/linda_JUM/Araport11_genePred_refFlat_formatted.txt > JUM_C.out & from inside JUM_diff/FINAL_JUM_OUTPUT_pvalue_0.05/

Thank you for your help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/30#issuecomment-628358971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ6PW5ATNVHYWQEHXYPMCLRRNN7FANCNFSM4MZY73LA .

lindayqlin commented 4 years ago

No worries. Thank you for all your help!

That worked well for me. I didn't realize that there was this discrepancy -- thanks for pointing it out.

I still get some (although way fewer) "NONE" results in the IR output. I inspected these visually and it appears that approximately half are annotated as exons/UTRs rather than introns, while a few do correspond to introns but didn't get identified as such. Do you know why this might be? (The rest are not annotated so "NONE" seems fitting for those.)

One other question for you: what is the criteria for calling "INF" for delta PSI?

Thank you in advance!

qqwang-berkeley commented 4 years ago

Do you mind giving me an example of the "none" IRs that you described as "I inspected these visually and it appears that approximately half are annotated as exons/UTRs rather than introns, while a few do correspond to introns but didn't get identified as such"? An igv browser shot or something similar will be good.

"INF" can mean that in the beginning there is no intron-retained isoform (so the basal level is zero), but after the biological change there is some expression of the intron-retained isoform, as a result deltaPSI becomes (IR_later - IR_before) divided by "zero", and end up with INF (infinity). So these IR event could in fact be very interesting.

On Thu, May 28, 2020 at 11:34 AM Linda Lin notifications@github.com wrote:

No worries. Thank you for all your help!

That worked well for me. I didn't realize that there was this discrepancy -- thanks for pointing it out.

I still get some (although way fewer) "NONE" results in the IR output. I inspected these visually and it appears that approximately half are annotated as exons/UTRs rather than introns, while a few do correspond to introns but didn't get identified as such. Do you know why this might be? (The rest are not annotated so "NONE" seems fitting for those.)

One other question for you: what is the criteria for calling "INF" for delta PSI?

Thank you in advance!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/30#issuecomment-635424288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ6PW5255V2UNFVXRLWCXDRT2AA3ANCNFSM4MZY73LA .

lindayqlin commented 4 years ago

Apologies for the delayed response. Here are some IGV screenshots. They are named as _comment_chrpos. (I checked the expanded gene model track and there are no alternative isoforms with introns at these sites.) None.zip

I'm still a bit unclear on the "INF" results. You write in the manual that IR "deltaPSI = intron_inclusion_isoform / (intron_inclusion_isoform+intron_exclusion_isoform) (under control) - intron_inclusion_isoform / (intron_inclusion_isoform+intron_exclusion_isoform) (under treatment)." Are you saying that (intron_inclusion_isoform+intron_exclusion_isoform) would be zero under a condition? (No gene expression whatsoever? If so, would it be possible to just make the fraction zero?) I'm not sure if I'm understanding this correctly.

Thank you in advance!