Closed tabeariepe closed 1 year ago
We have not encountered this issue before, but it may be due to either an error in your input file, or a mismatch between the reference genomes used. If you can provide a small test file I can take a look into the issue.
I checked the reference genomes that I used and and I did not notice any mismatches. I uploaded small test files of my sqanti output here: https://filesender.surf.nl/?s=download&token=242f4bd5-15e8-4919-8ffa-4a83a75484e7 It would be great if you can have a look at it.
As reference, I used gencode v39 primary assembly and the corresponding pc_translation.fa file.
Hi, I figured out what causes the classification problem. It seems that for the reference coding sequence, the stop codon is not included while for the pacbio coding sequence it is included. Therefore, I get on offset of -3 for the pr_cterm_diff for most FSMs.
Hi,
I run your pipeline on my dataset and until the sqanti protein step everything works fine. However, when I classify the proteins, no full splice matches are found (even though they are in the sqanti transcript output). Most transcript FSMs are classified as NNC with novel C-terminus. I tried to figure out where this comes from and I saw that for most transcript FSMs pr_cterm_diff is -3 in the sqanti protein output. Do you know what could cause this classification problem? Is it a problem of my input data (I start with a previously generated sqanti3 file)?