qqwang-berkeley / JUM

A tool for annotation-free differential analysis of tissue-specific pre-mRNA alternative splicing patterns
MIT License
27 stars 11 forks source link

Run time for JUM_B.sh #24

Open adt0023 opened 4 years ago

adt0023 commented 4 years ago

Hello!

I'm very interested in seeing what kind of data comes out of this novel analysis. I've encountered an issue with running the JUM_B.sh step where it appears to be taking an exceptional amount of time (>12 hrs). Is this normal or is the code I'm running (below) incomplete? The hardware I'm running it on is designated for running bioinformatic analysis, so that shouldn't be the issue.

Template

bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2/JUM_diff --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

Much obliged, Andrew

qqwang-berkeley commented 4 years ago

Hi Andrew,

So I noticed one thing in your command:

the original command bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2/JUM_diff --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

should be changed into: bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

The --Folder command is to specify the directory for the JUM package, not the running folder.

The other thing I noticed is that sometimes the hardware does not like the parallel sorting of bed files that is embedded in the JUM package, and it will stag the running time by sitting on one and delaying the others. One thing to check is to do a "ls -l -t" command in the running directory where you ran JUM_B.sh, and check the time where the following files are generated:

*Aligned.out_coverage.bed

*Aligned.out_coverage_sorted.bed

If the latter files are generated much later than the previous, then that is the case. If so, let me know and I will send you a modified JUM_B.sh that goes around this issue. This issue is quite random - sometimes the system handles it file and sometimes it runs into this issue, depending on what other users are doing on the system as well.

Qingqing

On Wed, Aug 21, 2019 at 4:58 PM adt0023 notifications@github.com wrote:

Hello!

I'm very interested in seeing what kind of data comes out of this novel analysis. I've encountered an issue with running the JUM_B.sh step where it appears to be taking an exceptional amount of time (>12 hrs). Is this normal or is the code I'm running (below) incomplete? The hardware I'm running it on is designated for running bioinformatic analysis, so that shouldn't be the issue.

Template

bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2/JUM_diff --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

Much obliged, Andrew

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/24?email_source=notifications&email_token=AGJ6PW26CBHGQPA2XMGNDCDQFWUA7A5CNFSM4IOOJKL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGT425Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AGJ6PW4MLF3APRKDPM62ZVLQFWUA7ANCNFSM4IOOJKLQ .

adt0023 commented 4 years ago

Qingqing,

Thank you for taking the time to address this issue. The Aligned.out_coverage_sorted.bed files are being generated about 10 minutes after the Aligned.out_coverage.bed files, but I'm not sure if that's enough to indicate staggered run times.

The first issue you mentioned is related to a recurring problem I've been having with how files are being organized into the tempJUM_run folders, where subsequent code presents errors unless I return the files from the tempJUM_run folders back to the original working directory. I wasn't sure if this was from an error on my part and I've been able to get the code to work up to the point of running JUM_B, but it's plausible that that's causing some of the issues I'm having now.

Many thanks,

Andrew Taylor PhD Graduate Assistant Biomedical Sciences; Exercise Physiology West Virginia University School of Medicine Room 3040 HSC North Morgantown, WV 26506

On Mon, Aug 26, 2019 at 9:59 AM Qingqing Wang notifications@github.com wrote:

Hi Andrew,

So I noticed one thing in your command:

the original command bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2/JUM_diff --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

should be changed into: bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2 --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

The --Folder command is to specify the directory for the JUM package, not the running folder.

The other thing I noticed is that sometimes the hardware does not like the parallel sorting of bed files that is embedded in the JUM package, and it will stag the running time by sitting on one and delaying the others. One thing to check is to do a "ls -l -t" command in the running directory where you ran JUM_B.sh, and check the time where the following files are generated:

*Aligned.out_coverage.bed

*Aligned.out_coverage_sorted.bed

If the latter files are generated much later than the previous, then that is the case. If so, let me know and I will send you a modified JUM_B.sh that goes around this issue. This issue is quite random - sometimes the system handles it file and sometimes it runs into this issue, depending on what other users are doing on the system as well.

Qingqing

On Wed, Aug 21, 2019 at 4:58 PM adt0023 notifications@github.com wrote:

Hello!

I'm very interested in seeing what kind of data comes out of this novel analysis. I've encountered an issue with running the JUM_B.sh step where it appears to be taking an exceptional amount of time (>12 hrs). Is this normal or is the code I'm running (below) incomplete? The hardware I'm running it on is designated for running bioinformatic analysis, so that shouldn't be the issue.

Template

bash /user/home/JUM_2.0.2/JUM_B.sh --Folder /user/home/JUM_2.0.2/JUM_diff --Test pvalue --Cutoff 0.05 --TotalFileNum 6 --Condition1_fileNum_threshold 2 --Condition2_fileNum_threshold 2 --Condition1SampleName ctrl1,ctrl2,ctrl3 --Condition2SampleName treat1,treat2,treat3

Much obliged, Andrew

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/qqwang-berkeley/JUM/issues/24?email_source=notifications&email_token=AGJ6PW26CBHGQPA2XMGNDCDQFWUA7A5CNFSM4IOOJKL2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGT425Q , or mute the thread < https://github.com/notifications/unsubscribe-auth/AGJ6PW4MLF3APRKDPM62ZVLQFWUA7ANCNFSM4IOOJKLQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/qqwang-berkeley/JUM/issues/24?email_source=notifications&email_token=ALOHD5NXZPA6TRXHP4FKWATQGPOV5A5CNFSM4IOOJKL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5EOSII#issuecomment-524871969, or mute the thread https://github.com/notifications/unsubscribe-auth/ALOHD5KGWLPD6RUHFZGM3FLQGPOV5ANCNFSM4IOOJKLQ .