Issue with the output and help required to interpret it

alkasingh21 commented 8 months ago

Hi, I was using the FuSeq_WES module to run the detection of fusion genes (in particular MYB-NFIB and MYBL1-NFIB) in our WES data. These fusion genes are the hallmark of that cancer type. I did not encounter any issue while running FuSeq_WES, just the output generated at the end of the run has just "splitReadInfo.txt" and "feq_ALL.txt" files. No "FuSeq_WES_FusionFinal.txt" was generated though. Is it a concern and how to fix this.

Also, it was hard to look for the fusion genes of interest in the "splitReadInfo.txt" and "feq_ALL.txt" files. Sharing the output files for you to have a look. Any help would be appreciated. feq_ALL.txt

code: python3 /scratch/FuSeq/test/FuSeq_WES/fuseq_wes.py --bam /scratch/FuSeq/EI-Ex-64S-01_S1.bam --gtf /scratch/FuSeq/UCSC_hg38_wes_r100.json --mapq-filter --outdir /scratch/FuSeq/output/

nghiavtr commented 8 months ago

Hi @alkasingh21 Thank you for using FuSeq_WES in your research. From your description, it is likely your WES data does not contain reads supporting two genes MYB-NFIB and MYBL1-NFIB.

All potential fusion candidates from data are provided in "splitReadInfo.txt" and "feq_ALL.txt" files. Then they might be excluded after going through different filters/tests. So, if there is no FuSeq_WES_FusionFinal.txt file, it means all the candidates were filtered out.

I also tried to examine the feq_ALL.txt you provided. Two genes of each fusion equivalence class (feq) are provided in one Feq ID (last column) with a certain number of supporting reads. I also did not found any feq supporting MYB-NFIB and MYBL1-NFIB.

It is noted that although there are fusion genes (for example validated through other methods) in the patients, it can not always guarantee that WES of the patients contain reads supporting the gene fusions. It can be due to the location of break points are too far from exon boundaries, or the read coverage is too low to have reads capturing the gene fusions, etc

I hope my explanation is clear and helpful to you. Good lucks with your research.

Best, Nghia

alkasingh21 commented 8 months ago

Thank you so much for the detailed explanation. The whole exome sequencing was performed on FFPE samples, so the sequencing quality won't be of great quality as the samples are mostly degraded. Is there a way to relax the filtering criteria of FuSeq_WES considering the initial sample from where the sequencing data was generated like in our case it's FFPE? Also, is it recommended to relax the default filtering criteria?

nghiavtr commented 8 months ago

Hi @alkasingh21

I see, FFPE samples might not contain good quality data.

Usually, two files "splitReadInfo.txt" and "feq_ALL.txt" contain all potential candidates (without filtering). So you can check the fusions in those files.

You can also revise the default parameter settings in fuseq_wes.py (https://github.com/nghiavtr/FuSeq_WES/blob/main/fuseq_wes.py) to generate more potential candidates in "splitReadInfo.txt" and "feq_ALL.txt". Please look closely at lines 176 - 183. For example, you can consider reducing the values of --mapq-filter , --minNonOverlap to accept more sensitivity in the output.

Best, Nghia

nghiavtr / FuSeq_WES

Issue with the output and help required to interpret it #12