wososa / PSI-Sigma

PSI-Sigma
Other
35 stars 10 forks source link

Question on the necessity of re-generating IR.out.tab for the same BAM file in different experiments #16

Closed husensofteng closed 3 years ago

husensofteng commented 3 years ago

Thank you Woody for making PSI-Sigma available.

I have more than 100 BAM files that are divided into different some groups in different experiments. For each experiment, I have made a directory with its own groupa.txt and groub.txt with making soft links to the BAM files and their pre-generated SJ.out.tab.

Since a sample can be in more than one experiment, I wonder if it is necessary to re-generate the IR.out.tab file for the same BAM file in a different experiment? Would it be possible to generate IR.out.tab for all the BAM files once and re-use them when running different sub-groups.
I am asking this to save time because this step seems to take quite some time (1 hour/sample) and the samples are processed sequentially.

wososa commented 3 years ago

@husensofteng ,

Yes, to save time, you can generate IR.out.tab beforehand.

  1. Note: input format perl PSIsigma-ir-v.1.2.pl [.db file] [.bam file] 1

  2. Generate .ir.out.tab file for each .bam file in parallel (1~6 hours per task) perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db A1.Aligned.sortedByCoord.out.bam 1 & perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db A2.Aligned.sortedByCoord.out.bam 1 & perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db A3.Aligned.sortedByCoord.out.bam 1 & perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db B1.Aligned.sortedByCoord.out.bam 1 & perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db B2.Aligned.sortedByCoord.out.bam 1 & perl ~/tools/PSI-Sigma-1.9j/PSIsigma-ir-v.1.2.pl PSIsigma1d9j.db B3.Aligned.sortedByCoord.out.bam 1 &

  3. After all the IR.out.tab files are generated, you can link them to your working directory like this: ln -s A1.IR.out.tab afolder

Please let me know if this works for you.

Best, Woody

husensofteng commented 3 years ago

Thanks for the reply.

The SJ.out.tab output for the same BAM differs depending on the content of PSIsigma.db. It seems to me that PSIsigma.db is different when different samples are included in the experiments.

Consider these two experiments:

exp1:
groupa.txt: s1.bam, s2.bam
groupb.txt: s3.bam, s4.bam

Output: PSIsigma_exp1.db
exp2:
groupa.txt: s1.bam, s2.bam
groupb.txt: s3.bam, s4.bam, s5.bam

Output: PSIsigma_exp2.db

The .IR.out.tab of s1.bam is different in exp1 and exp2 because PSIsigma_exp1.db and PSIsigma_exp2.db are slightly different.

I wonder how the database is made and if it can be made universal across experiments?

wososa commented 3 years ago

@husensofteng ,

Good point. Yes, the IR file will be slightly different in exp1 and exp2. To solve this problem, you can put all available SJ.out.tab files in one folder and then generate a comprehensive .db file. Once the .db file is generated, you can use it for generating all the .IR.out.tab files. In fact, using the same .db file will make your comparisons more consistent and comparable. Because you only need .db files to be generated, you can just create groupa.txt and groupb.txt for all SJ.out.tab files (each cover 50% of the samples).

Sorry about my late reply.

Thanks, Woody

husensofteng commented 3 years ago

Thanks a lot Woody for the reply