wososa / PSI-Sigma

PSI-Sigma
Other
35 stars 10 forks source link

sorted.txt file is empty while log files have no errors #64

Closed Gtripathi-ai closed 1 month ago

Gtripathi-ai commented 1 month ago

Hi Woody,

Thank you for this useful tool, again. I am using this tool, PSI-Sigma on the RNA fractionation data - Chromatin, Nucleus, Cytoplasm, and Whole cell It ran perfectly well on all four fractions. However, with the nucleus fractionation data, the log.txt file looks fine while the sorted.txt file is empty. What could be the reason for the empty files? I have attached the log.txt and sorted.txt files below. ZRSR2_N_Pool2_kd.Log.txt ZRSR2_N_kd_Pool2_r10_ir3.sorted.txt

Thanks. Br, Garima Tripathi

wososa commented 1 month ago

Hi @Gtripathi-ai ,

Thanks for trying PSI-Sigma. I will need to see your system log (when running PSI-Sigma). If you still have it, sharing it with me will help me better understand the number of records with p-value. If that number 0, you may want to check if the chromosome names between ‘gtf’ and junction read files (.SJ.out.tab) are the same. For example chr1 versus 1.

Best, Woody

Gtripathi-ai commented 1 month ago

Hi Woody,

Thanks for your reply. The system log file is attached below. ZRSR2_N_kd_Pool2_slurm-22383850.out.txt

Thanks. Br, Garima Tripathi

wososa commented 1 month ago

Hi @Gtripathi-ai ,

Thanks for sharing the log file. The Number of samples = 2 is showing only two samples in the groupa.txt and groupb.txt files. Could you check if the file names in the two files are listed one file name per line? You might have listed multiple names in one line, so the file name doesn't match.

Best, Woody

Gtripathi-ai commented 1 month ago

Hi Woody,

txt files look like this: cat ZRSR2_N_kd.groupa.txt V1_G_N V2_G_N V3_G_N V4_G_N V5_G_N

cat ZRSR2_N_kd.groupb.txt V1_Z_N V2_Z_N V3_Z_N V4_Z_N V5_Z_N

However, I am thinking of rerunning the analysis to check if the same error is reported again.

Thanks. Br, Garima

wososa commented 1 month ago

@Gtripathi-ai ,

Yes, rerun using existing files in the working directory will be fast. It's a good idea. Let me know if you encounter the same error. I will be happy to help.

Best, Woody

Gtripathi-ai commented 1 month ago

Hi Woody,

Again the same error is reported. I do not know why "Number of samples = 2" is reported. However, the log file has an analysis of all the samples that are 10. I have no idea, what things I can fix in this situation.

Calculating PSI values... Number of events = 134417 Number of samples = 2 Statistics option = Student's t-test number of p-value = 0 Number of final p-value = 0 Skipping p-value adjustment. number of fdr(BH) = 0 ===PSI analysis spent 0.0247 hours.=== Filtering ΔPSI results... Filtering mode = 3 Reading... Homo_sapiens.GRCh38.112.sorted.gtf.mapping.txt Reading... ZRSR2_N_kd.db ===Filtering spent 0.0006 hours.=== Archiving... ZRSR2_N_kd_r10_ir3.txt ZRSR2_N_kd_r10_ir3.txt

***Total: 4.2272 hours (or 253.632mins). Publishing results to /scratch/project_2010461/FOR_EACH_SAMPLE_180724/PSI-Sigma_Pool2/ZRSR2/re-Nucleus/ZRSR2_N_Pool2_output

Thanks. Br, Garima

wososa commented 1 month ago

I should have noticed this earlier. The accession number of each sample is recognized only partially. For example, “accession = G” is reported all of the “N” group (should have been V1_G_N, for example). It explains why there are only two samples (G and Z) in the end. Are there invisible characters in the groupa.txt file? If you can send the raw file to me, I can take a look.

It could also have been that PSI-Sigma’s script parsed the names incorrectly. I will test it out. The easiest solution is probably removing “_N” and “_T” in the files names of groupa.txt groupb.txt .SJ.out.tab .IR.out.tab files.

Best, Woody

On Thu, Aug 1, 2024 at 9:14 AM Gtripathi-ai @.***> wrote:

Hi Woody,

Again the same error is reported. I do not know why "Number of samples = 2" is reported. However, the log file has an analysis of all the samples that are 10. I have no idea, what things I can fix in this situation.

Calculating PSI values... Number of events = 134417 Number of samples = 2 Statistics option = Student's t-test number of p-value = 0 Number of final p-value = 0 Skipping p-value adjustment. number of fdr(BH) = 0 ===PSI analysis spent 0.0247 hours.=== Filtering ΔPSI results... Filtering mode = 3 Reading... Homo_sapiens.GRCh38.112.sorted.gtf.mapping.txt Reading... ZRSR2_N_kd.db ===Filtering spent 0.0006 hours.=== Archiving... ZRSR2_N_kd_r10_ir3.txt ZRSR2_N_kd_r10_ir3.txt

***Total: 4.2272 hours (or 253.632mins). Publishing results to /scratch/project_2010461/FOR_EACH_SAMPLE_180724/PSI-Sigma_Pool2/ZRSR2/re-Nucleus/ZRSR2_N_Pool2_output

Thanks. Br, Garima

— Reply to this email directly, view it on GitHub https://github.com/wososa/PSI-Sigma/issues/64#issuecomment-2263013584, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWLHWJUIT4C773OWIMQMFTZPIYFDAVCNFSM6AAAAABLT5HTTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGAYTGNJYGQ . You are receiving this because you commented.Message ID: @.***>

Gtripathi-ai commented 1 month ago

Hi Woody,

Please find attached my text files: groupa.N.txt (control) and groupb.ZRSR2.N.txt (ZRSR2 knockdown). groupa.N.txt groupb.ZRSR2.N.txt

I will also try changing the naming of these files and re-run them. I will let you know soon.

Thanks. Br, Garima

wososa commented 1 month ago

Hi @Gtripathi-ai ,

I found the bug. Some old regular expression codes are causing the issue. For example:

$accession=~s/(.*)\_(\w+)\_N$/$2/;

It explained why your file names were misinterpreted. Please avoid using "_N" or "_T" in the end of the file names. It should fix your issue. Sorry about the inconvenience.

Thanks for pointing out the issue.

Best, Woody

Gtripathi-ai commented 1 month ago

Hi Woody,

Just to confirm, after changing _N to _Nucleus (especially as a suffix), PSI-Sigma started reading other samples. Thank you very much for your valuable suggestion.

Kind regards, Garima

wososa commented 1 month ago

Hi @Gtripathi-ai ,

Great news. I hope that you find something valuable from your analysis!

Cheers, Woody