Closed sanyalab closed 3 months ago
Can you share the log file with me? Thank you.
Attached are the log and the STDERR files ECB_Hap2_psic_classes.log STDERR.txt
Thank you for sharing the file. It seems there are three gene regions that hasn't finished Chr16B 4741698 4760443. Chr20B 13708701 13730914. Chr30B 4295343 4490726. Do you receive any error message from PsiCLASS, or the job gets terminated due to long running time?
You can run PsiCLASS with a lower value of --maxDpConstraintSize to make it faster. During the test, you can also add the option --stage 3 to reuse the subexon structures from previous step. Another approach is to see whether there are some strange introns in these region, and use a more stringent threshold to filter the introns.
Hi, I do not get any error message so I do not know how long the program ran. But this is interesting. Essentially I am trying to build a set of transcripts on each haplotype. Haplotypes are not much different from each other. Therefore, I wonder why it fails on one, but succeeds in the other one.
How does one give an intron threshold in PsiClass? Can you tell me how to get a detailed help for psiclass?
I'll start the run once I hear back from you
Thanks a lot Abhijit
To filter introns, you can use the option "--sa" to specify the minimum average number of reads supporting an intron. You can try a higher value for the filter.
How many samples do you have when running PsiCLASS?
Typically I will have 8 - 10 bam files to work with.
By intron threshold I meant if there was any method to specify length of introns or any way to specify the minimum distance between read clusters.
The long intron size is inferred by the 99% quantile of all the intron lengths, so if you reduce this value too much, you may miss some real introns. I think trying a smaller value for --maxDpConstraintSize, say 5, could be better.
I think you can check the splice/*.splice file to confirm that there are some introns should be filtered.
What does "--maxDpConstraintSize" do in general? I didn't quite follow. I'll check the splice file.
Thanks Abhijit
It kind of controls how many subexons a read can be partially aligned with. In a clean gene, a read probably only spans two or three subexons, so 5 or the default value 7 is a quite safe choice in general.
Ah I see. Can you tell me what the fields are in the splice file? Or point me to the documentation? Could not find it.
Chr01B 198 12089 185 - 185 0 59 0
Chr01B 200 12222 4 - 4 0 12 0
Chr01B 228 12151 2 - 2 0 9 0
Chr01B 393 12418 12 - 11 1 27 2
Chr01B 436 12321 97 - 97 0 6 0
Chr01B 4678 6745 86 - 14 72 27 41
Thanks Abhijit
The fields are chrom_name start_site end_site number_of_read_support strand other_support_information(e.g. unique alignment, number of mismatches).
I see.... A few additional questions
Q1: Just to confirm, the start_site and end_site are the coordinates of each intron and the "number_of_read_support" is the number of reads that span the intron. Is that Correct?
Q2: In the example below, Number_of_unique_alignments = 185, Number_of_mismatches = 0 or 59?
Chr01B 198 12089 185 - 185 0 59 0
Q3: If I filter the splice file based on intron length, it will effect the subexon graph and consequently the transcript models right? In this case do I start from stage 2?
Best Regards and thank you for the prompt replies.
-Abhijit
Q1: Yes. To be exact, start_site is the end of the left exon, and end_site the start of the right exon.
Q2: 59 is the number of mismatch from the unique alignment. The two 0's are for multi-mapped reads
Q3: You need to start from stage 1.
Hello,
I used the following command to rerun PSICLASS
nohup ~/SOFTWARE/PsiCLASS/psiclass --lb /data/sanyalab/TESTAREA/BamFile_genomeB.path --maxDpConstraintSize 5 --stranded rf -o ECB_Hap2_psic --stage 3 -p 20 &
I got the same result. The program crashes/quits/stops without a error report. Attached are the log files nohup.txt ECB_Hap2_psic_classes.log
Also attaching a screenshot of the files created. All the gtf files are empty. Ignore the "PREV" directory
Thanks Abhijit
Could you please share the file subexon/abc_subexon_combined.out file and the header of the bam file ("samtools view -H xyz.bam") with me?
Was this ever solved?
Hi John,
I never got to spend time on this, and moved away onto other things. If the situation occurs again, I'll reopen the issue.
Thanks -Abhijit
Hi,
I am running psiclass v1.0.3. The tool runs fine for many datasets. Only sometimes it fails without throwing an error. I have followed the installation as mentioned in the README.
Tail of the log file is as below
Thread 10: Chr04B 17383812 17423435 finished.
Thread 13: Chr15B 8433979 8514831 finished.
Thread 6: Chr06B 11481451 11510381 finished.
Thread 16: Chr30B 6353608 6565124 finished.
Thread 14: Chr04B 9067830 9103488 finished.
Thread 7: Chr21B 12043077 12160244 finished.
Thread 8: Chr30B 6579940 6889886 finished.
Thread 17: Chr05B 9204076 9354827 finished.
Thread 12: Chr16B 8151279 8203086 finished.
Thread 5: Chr06B 16478464 16603194 finished.
Tail of the STDERR
subexon-info egg_late.bam ./splice/ECB_Hap2_psic_bam_0.splice > ./subexon/ECB_Hap2_psic_subexon_0.out
subexon-info female.bam ./splice/ECB_Hap2_psic_bam_1.splice > ./subexon/ECB_Hap2_psic_subexon_1.out
subexon-info gut_instar_3.bam ./splice/ECB_Hap2_psic_bam_2.splice > ./subexon/ECB_Hap2_psic_subexon_2.out
subexon-info instar_1.bam ./splice/ECB_Hap2_psic_bam_3.splice > ./subexon/ECB_Hap2_psic_subexon_3.out
subexon-info instar_3.bam ./splice/ECB_Hap2_psic_bam_4.splice > ./subexon/ECB_Hap2_psic_subexon_4.out
subexon-info instar_5.bam ./splice/ECB_Hap2_psic_bam_5.splice > ./subexon/ECB_Hap2_psic_subexon_5.out
subexon-info male.bam ./splice/ECB_Hap2_psic_bam_6.splice > ./subexon/ECB_Hap2_psic_subexon_6.out
subexon-info pupa.bam ./splice/ECB_Hap2_psic_bam_7.splice > ./subexon/ECB_Hap2_psic_subexon_7.out
combine-subexons --ls ./subexon/ECB_Hap2_psic_subexon.list > ./subexon/ECB_Hap2_psic_subexon_combined.out
classes --stranded rf -p 20 --lb BamFile_genomeB.path -s ./subexon/ECB_Hap2_psic_subexon_combined.out -o ./ECB_Hap2_psic > ./ECB_Hap2_psic_classes.log
I cannot share bam files for this dataset and I am not sure whether I can recreate the error with others since the error is unpredictable.
Thanks Abhijit