qqwang-berkeley / JUM

A tool for annotation-free differential analysis of tissue-specific pre-mRNA alternative splicing patterns
MIT License
27 stars 11 forks source link

Issue with JUM_3.sh script #10

Closed MDBrokaw closed 6 years ago

MDBrokaw commented 6 years ago

I’ve encountered an issue with the JUM_3.sh script.

Runs OK until reaching the “long_intron_retention” processing steps. At this point it generates enormous temp files (e.g. IndexA_temp_long_intron_retention_junction_coordinate_with_read_num_pvalue_0.05) that are 20-200 GB.

At this point it also reports millions of identical errors: Use of uninitialized value in print at count_intron_read_long_intron_retention_step2.pl line 44, <IN2> line 1

These millions of errors are preceded by this single error: ERROR: illegal character ‘+’ found in integer conversion of string “+”. Exiting… Argument “+” isn’t numeric in subtraction (-) at count_intron_read_long_intron_retention_step2.pl line 40, <IN2> line 1.

Any ideas? All the other classes of Alternative splicing (e.g. 5’SS, etc.) seem to have been computed just fine. Thanks!

qqwang-berkeley commented 6 years ago

Hi,

Would you send me a few lines of the AS_differential.txt file you got from running the R_script_JUM.R script? Thanks! I will help you debug.

Qingqing

MDBrokaw commented 6 years ago

Excellent, thanks for your help.

First few lines of our AS_differential.txt:

groupID featureID exonBaseMean dispersion stat pvalue padj X13m3_14m2 N2 log2fold_N2_13m3_14m2 genomicData.seqnames genomicData.start genomicData.end genomicData.width genomicData.strand countData.IndexA countData.IndexB countData.IndexC countData.IndexD transcripts 5_Junction_70652_Junction_70653:E001 5_Junction_70652_Junction_70653 E001 220.071429128683 0.0012714572240681 5.62693062493901 0.0176865742458979 0.055308582157126 2.30659362437753 2.38247202833251 0.253209445884905 IV 4389750 4389804 55 - 171 171 284 284 exonic_part_number 001 gene_id 5_Junction_70652_Junction_70653 5_Junction_70652_Junction_70653:E002 5_Junction_70652_Junction_70653 E002 4510.78340115722 0.000776507110775998 6.47168092666713 0.0109606814530061 0.0374330880787359 3.65835086903501 3.6452533692903 -0.0435186550123472 IV 4389750 4389889 140 - 3144 3144 6414 6414 exonic_part_number 002 gene_id 5_Junction_70652_Junction_70653 5_Junction_69673_Junction_69674:E001 5_Junction_69673_Junction_69674 E001 77.3547483148137 0.00737832058902636 6.22512000683747 0.0125950371902186 0.0420323408605057 1.86361802390202 1.92451620812156 0.204916821318095 IV 363879 363974 96 - 60 60 100 100 exonic_part_number 001 gene_id 5_Junction_69673_Junction_69674 5_Junction_69673_Junction_69674:E002 5_Junction_69673_Junction_69674 E002 9.5724903310595 0.00367549702846617 6.56934219635649 0.0103750290524118 0.0356617696026629 1.14516301760211 0.815348486625332 -1.22800023701497 IV 363879 364207 329 - 4 4 18 18 exonic_part_number 002 gene_id 5_Junction_69673_Junction_69674

qqwang-berkeley commented 6 years ago

I see. I think it is probably caused by the chromosome naming system. I have tested JUM on organisms that generally use "chr1, chr2" etc. In your case it looks like "IV", "V" etc for chromosomes.

Would you send me a few lines of the following files in your JUM_diff folder: 1) UNION_junc_coor_with_junction_ID_morethan* 2) more_than_X_profiled_total_AS_event_junction_first_processing_for_JUM_reference_building.txt

At the same time I will go through every perl script called by JUM_3.sh to confirm if for intron retention processing the chromosome naming is restricted to the "chrX" system. I will keep you updated.

On Thu, Dec 28, 2017 at 8:14 AM, MDBrokaw notifications@github.com wrote:

Excellent, thanks for your help.

First few lines of our AS_differential.txt:

groupID featureID exonBaseMean dispersion stat pvalue padj X13m3_14m2 N2 https://maps.google.com/?q=X13m3_14m2+N2&entry=gmail&source=g log2fold_ N2_13m3_14m2 https://maps.google.com/?q=N2_13m3_14m2&entry=gmail&source=g genomicData.seqnames genomicData.start genomicData.end genomicData.width genomicData.strand countData.IndexA countData.IndexB countData.IndexC countData.IndexD transcripts 5_Junction_70652_Junction_70653:E001 5_Junction_70652_Junction_70653 E001 220.071429128683 0.0012714572240681 5.62693062493901 0.0176865742458979 0.055308582157126 2.30659362437753 2.38247202833251 0.253209445884905 IV 4389750 4389804 55 - 171 171 284 284 exonic_part_number 001 gene_id 5_Junction_70652_Junction_70653 5_Junction_70652_Junction_70653:E002 5_Junction_70652_Junction_70653 E002 4510.78340115722 0.000776507110775998 6.47168092666713 0.0109606814530061 0.0374330880787359 3.65835086903501 3.6452533692903 -0.0435186550123472 IV 4389750 4389889 140 - 3144 3144 6414 6414 exonic_part_number 002 gene_id 5_Junction_70652_Junction_70653 5_Junction_69673_Junction_69674:E001 5_Junction_69673_Junction_69674 E001 77.3547483148137 0.00737832058902636 6.22512000683747 0.0125950371902186 0.0420323408605057 1.86361802390202 1.92451620812156 0.204916821318095 IV 363879 363974 96 - 60 60 100 100 exonic_part_number 001 gene_id 5_Junction_69673_Junction_69674 5_Junction_69673_Junction_69674:E002 5_Junction_69673_Junction_69674 E002 9.5724903310595 0.00367549702846617 6.56934219635649 0.0103750290524118 0.0356617696026629 1.14516301760211 0.815348486625332 -1.22800023701497 IV 363879 364207 329 - 4 4 18 18 exonic_part_number 002 gene_id 5_Junction_69673_Junction_69674

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-354313146, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn2-1jAQNGO0L3GWkw8HEJk0QaOkb0ks5tE75rgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Ahh, interesting. Thanks for the help. Here are the first few lines of UNION_junc_coor_with_junction_ID_morethan*

I 10432829 10432855 0 Junction_685 I 13653263 13653368 0 Junction_4534 I 14178098 14178142 0 Junction_5097 I 4117400 4117448 0 Junction_9531 I 4186712 4186768 0 Junction_9674 I 4556015 4556061 0 Junction_10152 I 536306 536348 0 Junction_11629 I 5429623 5429825 0 Junction_11762 I 6029749 6029947 0 Junction_12959 I 6100718 6100763 0 Junction_13098 I 10014582 10014730 + Junction_2 I 10015142 10015194 + Junction_3

And here is morethan......

5_Junction_70652_Junction_70653:001 IV - 4389750 4389804 5_Junction_70652_Junction_70653:002 IV - 4389750 4389889 5_Junction_69673_Junction_69674:001 IV - 363879 363974 5_Junction_69673_Junction_69674:002 IV - 363879 364207 5_Junction_67594_Junction_67595:001 IV - 17446610 17447286 5_Junction_67594_Junction_67595:002 IV - 17446610 17449561 5_Junction_71869_Junction_71870:001 IV - 5351583 5351797 5_Junction_71869_Junction_71870:002 IV - 5351583 5351805

Thanks again.

qqwang-berkeley commented 6 years ago

Hi there,

I am terribly sorry for my really late response... I hope you are still there.

It looks like the input files for the perl script count_intron_read_long_intron_retention_step2.pl have some formatting issue and where it supposed to be numbers (genomic coordinates of the junctions) turned out to be strand (which is "+" and "-"). It may have something to do with the specific organism that you work with and the format the genome is arranged.

From the large file size you reported that experienced error, I think it is the total long intron calculation that experienced problems. Is it OK that you can send me a few lines for the following file, in your JUM_diff folder:

1) a file ending with "coverage_temp_long_intron_overlap_total.txt" 2) a file ending with " temp_long_intron_retention_junction_coordinate_with_read_num_total.txt" 3) a file called "temp_long_intron_retention_junction_coordinate_total.txt"

I also realized that this perl script is not very good at memory usage. I will fix it in the next big update, which will come in a week or two.

Thanks and I promise that from now on I will be much more responsive about issue reported by users.

Thank you so much for running JUM and providing feedbacks!

Qingqing

On Mon, Jan 1, 2018 at 1:34 PM, MDBrokaw notifications@github.com wrote:

Ahh, interesting. Thanks for the help. Here are the first few lines of UNION_junc_coor_with_junction_ID_morethan*

I 10432829 10432855 0 Junction_685 I 13653263 13653368 0 Junction_4534 I 14178098 14178142 0 Junction_5097 I 4117400 4117448 0 Junction_9531 I 4186712 4186768 0 Junction_9674 I 4556015 4556061 0 Junction_10152 I 536306 536348 0 Junction_11629 I 5429623 5429825 0 Junction_11762 I 6029749 6029947 0 Junction_12959 I 6100718 6100763 0 Junction_13098 I 10014582 10014730 + Junction_2 I 10015142 10015194 + Junction_3

And here is morethan......

5_Junction_70652_Junction_70653:001 IV - 4389750 4389804 5_Junction_70652_Junction_70653:002 IV - 4389750 4389889 5_Junction_69673_Junction_69674:001 IV - 363879 363974 5_Junction_69673_Junction_69674:002 IV - 363879 364207 5_Junction_67594_Junction_67595:001 IV - 17446610 17447286 5_Junction_67594_Junction_67595:002 IV - 17446610 17449561 5_Junction_71869_Junction_71870:001 IV - 5351583 5351797 5_Junction_71869_Junction_71870:002 IV - 5351583 5351805

Thanks again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-354678794, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn21o04Dr5BQ2gaZk_cvUh54Eb15gdks5tGU9_gaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Greetings, still here!

Below are a few lines from the three files you suggested. Thanks for your assistance!

IndexC_coverage_temp_long_intron_overlap_total.txt X 60037 60038 32 X 60038 60040 30 X 60040 60041 29

IndexC_temp_long_intron_retention_junction_coordinate_with_read_num_total.txt I 100625 100626 - Junction_72 3 I 100626 100627 - Junction_72 3 I 100627 100628 - Junction_72 3

temp_long_intron_retention_junction_coordinate_total.txt I 100625 101503 - Junction_72 I 10188896 10189261 - Junction_282 I 10189166 10189990 - Junction_283

qqwang-berkeley commented 6 years ago

OK. The format for these files look good. I am wondering if there are some weird lines in these input files that are causing trouble. Do you mind sharing with me the following files (this time, complete files):

Set 1: 1) Any file that ending with coverage_temp_long_intronoverlap"$pvaluepadj""$cutoff".txt 2) A file called temp_long_intron_retention_junctioncoordinate"$pvaluepadj""$cutoff".txt;

Set 2: 1) Any file that ending with coverage_temp_long_intron_overlap_total.txt 2) A file called temp_long_intron_retention_junction_coordinate_total.txt

These are the only two sets of files that the script count_intron_read_long_intron_retention_step2.pl calls. I am going to test them on my end and see if they spit out similar errors. I am also going to check if there are any weird lines in these input files.

Feel free to share with me with either dropbox or google drive. Let me know if you prefer other ways to send these files.

Qingqing

On Thu, Feb 15, 2018 at 11:23 AM, MDBrokaw notifications@github.com wrote:

Greetings, still here!

Below are a few lines from the three files you suggested. Thanks for your assistance!

IndexC_coverage_temp_long_intron_overlap_total.txt X 60037 60038 32 X 60038 60040 30 X 60040 60041 29

IndexC_temp_long_intron_retention_junctioncoordinate with_read_num_total.txt I 100625 100626 - Junction_72 3 I 100626 100627 - Junction_72 3 I 100627 100628 - Junction_72 3

temp_long_intron_retention_junction_coordinate_total.txt I 100625 101503 - Junction_72 I 10188896 10189261 - Junction_282 I 10189166 10189990 - Junction_283

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-366034036, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn23YsWEEmyiDWOHBZbNF5BXI1t857ks5tVIQxgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

OK, I have attached one of the files you mentioned. The other three requests you made are for files that are created but left empty.

(Disclosure: since the time of the initial message I deleted all files/scripts, re-installed the newest version of JUM [1.3.11] and started over. I am now getting errors at the same step, but of a slightly different nature. Giant error file created and the new error message is as follows…) Use of uninitialized value $array[3] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[1] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[2] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[3] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 2. …etc. temp_long_intron_retention_AS_differential_pvalue_0.05.txt

qqwang-berkeley commented 6 years ago

I see. Now it seems the previous error is skipped and the error message comes from a later step in JUM_3.sh. I will need another set of input files. To save us time, will it be possible for me to access your JUM_diff folder before you ran JUM_3.sh? A share through dropbox or google drive will suffice, and will be the size of several GBs.

The files I need to run a thorough debug are:

UNION_junc_coor_with_junction_ID*

more_than_X_profiled_total_AS_event*

combined_count.txt Aligned.out_coverage.bed

AS_differential.txt

combined_AS_JUM.gff

Basically, all the input files in the JUM_diff folder after your Rscript run. If you do a "ls -l -t" , the files I need are the files before and upon the generation of AS_differential.txt. Is it possible to do this? The error sounds like some weird file format issue, which should be easy to fix but I will need to spot the weird lines.

Thank you!

Qingqing

On Mon, Feb 19, 2018 at 2:24 PM, MDBrokaw notifications@github.com wrote:

OK, I have attached one of the files you mentioned. The other three requests you made are for files that are created but left empty.

(Disclosure: since the time of the initial message I deleted all files/scripts, re-installed the newest version of JUM [1.3.11] and started over. I am now getting errors at the same step, but of a slightly different nature. Giant error file created and the new error message is as follows…) Use of uninitialized value $array[3] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[1] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[2] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 1. Use of uninitialized value $array[3] in hash element at ../../JUM_1.3.11/profiling_splicing_patterns_from_AS_events_1.pl line 25, line 2. …etc. temp_long_intron_retention_AS_differential_pvalue_0.05.txt https://github.com/qqwang-berkeley/JUM/files/1738322/temp_long_intron_retention_AS_differential_pvalue_0.05.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-366816887, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn2_hY1x7FbwT_nvdA0mghe1WWwRORks5tWfSzgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Excellent. Here is a link to a Google Drive that should have all files you requested.

https://drive.google.com/open?id=1ARU9VJwuV-259q6q-cFg2j0RKG_KCB7f

Thanks again! (Sorry for the hassle.)

qqwang-berkeley commented 6 years ago

The reason you got that error message is because the file:

UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples_formatted_junction_list.txt

is put in the JUM_diff folder for downstream analysis, but the correct file should be:

UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt

These are two completely different files with different format; that is why the script profiling_splicing_patterns_from_AS_events_1.pl in JUM_3.sh could not recognize it and generated empty files.

Is there any chance that the wrong file was copied to the JUM_diff folder by mistake? At the end of JUM_2-3.sh it should copy the right file to the JUM_diff folder as

the last line of JUM_2-3.sh is:

cp UNION_junc_coor_with_junction_ID_morethan"$threshold"_read_in_atleast"$file_num"_samples.txt JUM_diff/

I understand that the file names are similar to each other and cause confusion. I will keep a note about that and in the upcoming upgrade I will let JUM delete the intermediate files that will no longer be needed in the downstream analysis, so as to reduce the confusion from users. Thank you for the feedback!

Let me know if it runs fine now once the correct file is put to the JUM_diff folder AND the wrong file is deleted.

P.S. I notice that the replicates count files are the same for each condition. Is it because you originally didn't have biological replicates? But the coverage.bed files look different for the replicates though. If you don't have replicates let me know and I will send you a detailed instruction about a workaround. You can still use JUM in that scenario.

On Wed, Feb 21, 2018 at 10:21 AM, MDBrokaw notifications@github.com wrote:

Excellent. Here is a link to a Google Drive that should have all files you requested.

https://drive.google.com/open?id=1ARU9VJwuV-259q6q-cFg2j0RKG_KCB7f

Thanks again! (Sorry for the hassle.)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-367421333, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn28WApduIHLZscH0FNoWRI-cH7rKkks5tXF65gaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Oops, that was my mistake in uploading to the Google Drive. I accidentally uploaded the wrong file (it came from the JUMwork directory).

The correct file, UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt, is correctly copied into the JUM_diff directory. I have added it to the Google Drive as well.

qqwang-berkeley commented 6 years ago

I just ran JUM_3.sh in the JUM_diff folder and it went smooth without any error report... And it is finished within minutes.

qingqing@compute1:qingqing/JUM_troubleshoot/JUM_diff$ bash ~/JUM_1.3.11/JUM_3.sh ~/JUM_1.3.11 pvalue 0.05 4 2

Smartmatch is experimental at /mnt//riolab/qingqing/JUM_1.3.11/profiling_splicing_patterns_from_AS_events_3_updated.pl line 116.

qingqing@compute1:qingqing/JUM_troubleshoot/JUM_diff$

I did delete the wrong file "UNION_junc_coor_withjunction ID_more_than_5_read_in_at_least_2_samples_formatted_junction_list.txt" first.

I am sharing with you through google drive the folder JUM_diff after running JUM_3.sh and the resulted FINAL_JUM_OUTPUT inside. https://drive.google.com/drive/folders/1x8rpg0I6InjRUkZsmCsn90NgqrE8SaQ6?usp=sharing

Also, since your files don't have "chr" in the chromosome names, I am attaching an updated gene name mapping script with this email for running JUM_4.sh. I will include this change in the upcoming update. I will definitely add some cleaning and renaming steps in the update too so that file names are not as confusing to users.

On Sat, Feb 24, 2018 at 12:00 PM, MDBrokaw notifications@github.com wrote:

Oops, that was my mistake in uploading to the Google Drive. I accidentally uploaded the wrong file (it came from the JUMwork directory).

The correct file, UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt, is correctly copied into the JUM_diff file. I have added it to the Google Drive as well.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-368255472, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn20VdC22tJ7Hf3IBUm4DI6LLkrcvBks5tYGpWgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Ooooooooo, you just gave me the hint as to what I was doing wrong....

I didn't realize I needed to enter the JUM_diff directory and run JUM_3 from there. I was still in JUMwork where I had run previous JUM_2-3.... OOPS!

When I run JUM_3 in the proper directory, everything looks great! Thanks for your help/patience!

Regarding the updated JUM_4 script, where can I find it / where has it been attached? THANKS AGAIN.

qqwang-berkeley commented 6 years ago

Glad to hear that it works! Thank you for running JUM and for the great feedback. The manual will definitely be updated to facilitate users more.

For running JUM_4.sh, simply do the following:

1) copy the attached script to your JUM script folder (the folder you downloaded from the JUM github page, named as JUM_1.3_11; aka the one contains all the bash scripts and perl scripts, with file names ending with ".sh" or ".pl"). The copy command will automatically replace the original perl script in the folder. 2) Proceed to JUM_4.sh as instructed in the manual, step 17 and 18. Do notice that you need to run JUM_4.sh in the FINAL_JUM_OUTPUT folder :) JUM_4.sh will call a few scripts, including the newly edited gene name mapping one.

Let me know if you have any questions.

Qingqing

On Mon, Feb 26, 2018 at 7:03 PM, MDBrokaw notifications@github.com wrote:

Ooooooooo, you just gave me the hint as to what I was doing wrong....

I didn't realize I needed to enter the JUM_diff directory and run JUM_3 from there. I was still in JUMwork where I had run previous JUM_2-3.... OOPS!

When I run JUM_3 in the proper directory, everything looks great! Thanks for your help/patience!

Regarding the updated JUM_4 script, where can I find it / where has it been attached? THANKS AGAIN.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-368731229, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn20D2uv5IGokxvmwBZqSatv5dEBspks5tY3CHgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Happy to help! (And be helped :)

Hmmmm, I don't see any script attachment associated with your message on GitHub.

qqwang-berkeley commented 6 years ago

OK. I put the file called:

identify_gene_name_for_JUM_output_1.pl

in the google drive folder I shared with you previously (JUM_diff). Check if you have it. Then you can:

1) copy the attached script to your JUM script folder (the folder you downloaded from the JUM github page, named as JUM_1.3_11; aka the one contains all the bash scripts and perl scripts, with file names ending with ".sh" or ".pl"). The copy command will automatically replace the original perl script in the folder. 2) Proceed to JUM_4.sh as instructed in the manual, step 17 and 18. Do notice that you need to run JUM_4.sh in the FINAL_JUM_OUTPUT folder :) JUM_4.sh will call a few scripts, including the newly edited gene name mapping one.

Let me know if you have any questions :)

On Tue, Feb 27, 2018 at 5:51 AM, MDBrokaw notifications@github.com wrote:

Happy to help! (And be helped :)

Hmmmm, I don't see any script attachment associated with your message on GitHub.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-368883196, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn21CNde04InB0jRQK5N_hzH3lb2-nks5tZAhdgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Ah, beautiful! I have completed the pipeline (with my test data) successfully, and everything looks great. Ready to plug in my full experiment in the future. Thanks!

As an aside, is there any way to handle more than two conditions simultaneously? e.g., wild type, mutant1, mutant2? THANKS AGAIN.

qqwang-berkeley commented 6 years ago

Absolutely!

So for more than two conditions, it can come down to two scenarios:

1) time course experiments For this scenario, you just need to run all samples together from the very start as instructed on the manual. Then in the JUM_diff folder, right before the Rscript running step, you want to supply an experiment_design.txt file with the time course information, for example:

                 condition

sample1_1 0h sample1_2 0h sample2_1 2h sample2_2 2h sample3_1 4h sample3_2 4h

etc.

Then you follow the instructions again until the end. What you get in the FINAL_JUM_OUTPUT are files recording AS events that are changed in at least one of the time points compared to the beginning time point.

2) multiple conditions For this scenario, you have two options: 2.1 . You can run everything together from the beginning as instructed, and then in the JUM_diff folder, right before you run the Rscript step, you separate the files into sub directories, like: mut1_vs_WT, mut2_vs_WT, etc. In each of the directories, copy the corresponding input files from the current JUM_diff folder into each subdirectory, for example in the mut1_vs_WT directory the input files should include:

combined_AS_JUM.gff

more_than_5_profiled_total_AS_event_junction_first_processing_for_JUM_reference_building.txt

UNION_junc_coor_with_junction_ID_more_than_5_read_in_at_least_2_samples.txt

WT1Aligned.out_coverage.bed

WT2Aligned.out_coverage.bed

WT3Aligned.out_coverage.bed

mut1_1Aligned.out_coverage.bed

mut1_2Aligned.out_coverage.bed

mut1_3Aligned.out_coverage.bed WT1_combined_count.txt

WT2_combined_count.txt

WT3_combined_count.txt

mut1_1_combined_count.txt

mut1_2_combined_count.txt

mut1_3_combined_count.txt

Note, for common comparison samples like the WT samples here, you need to copy twice, to each of the subdirectories.

Then, in each of the subdirectories you ran the Rscript and JUM_3.sh, JUM_4.sh as instructed in the manual.

The advantage for running this way is that all junctions detected from all samples are named using the same ID number system, so it is extremely easy for you to compare the two mutant conditions later after the JUM pipeline is finished, for overlapped AS events detected by both mutant conditions or each mutant condition respectively, because all AS event will share the same naming ID system.

2.2 You can also choose to run everything separately from the beginning. Basically, construct subdirectories for each comparison before the JUM_2-1.sh run, and then each mutant condition will run its own independent JUM pipeline. The advantage for this way is that it is straightforward and it will specifically compare each mutant condition to WT (for example, suppose you have mutant condition 1 that is vastly different from mutant condition 2 compared to WT, then some junctions that are specific to mutant condition 1 will be taken into account when doing the mutant condition 2 vs WT comparison if you are following the way of 2.1, which may not be ideal). The disadvantage is that you can't compare the results from the two mutant conditions straightly, because each comparison will have their own naming system. So you need to compare the results from condition 1vs WT and conditon 2vs WT through checking the junction coordinates. It is not too bad, but just more work.

I would say if your conditions are biologically similar (for example, similar clones of the same edited cell line etc.) and sequencing samples are quite rigorously prepared with good depth and not so much variation, then go for 2.1. If your conditions are different, then go for 2.2.

I will include these instructions to the upcoming update of JUM. Thank you for all the feedback.

On Wed, Feb 28, 2018 at 12:29 PM, MDBrokaw notifications@github.com wrote:

Ah, beautiful! I have completed the pipeline (with my test data) successfully, and everything looks great. Ready to plug in my full experiment in the future. Thanks!

As an aside, is there any way to handle more than two conditions simultaneously? e.g., wild type, mutant1, mutant2? THANKS AGAIN.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/10#issuecomment-369372344, or mute the thread https://github.com/notifications/unsubscribe-auth/AZPn24-AhK2k2VIlDzmTbjsH7duBzJcwks5tZbdEgaJpZM4RHSZM .

MDBrokaw commented 6 years ago

Excellent! Thanks again!