splicebox / PsiCLASS

Simultaneous multi-sample transcript assembler for RNA-seq data
16 stars 4 forks source link

Segmenation fault when using add-genename script #19

Open cc-prolix opened 2 years ago

cc-prolix commented 2 years ago

Hello, I have used psiclass to run alignment based on BAM files generated with HISAT2. I have generated n psiclass_sample_1...n.gtf files and a psiclasss_vote.gtf using the command:

psiclass --lb mergelist.txt -o psiclass/

Now I would like to determine novel transcripts in the assembled GTF File. When using the script add-genename to attach gene names to the psiclasss_vote.gtf I received the following error: ./add-genename reference.gtf psiclass.gtf or ./add-genename reference.gtf psiclass_gtf.list Segmentation fault I am using a reference GTF files with only entries including gene_names. I have attached the reference annotation & psiclass_vote.gtf file. Is there something wrong with my reference annotation? Could I also use gene_ids instead of gene_names? In my complete reference annotation are entries that have no gene_name.

Thank you very much for your help! files.zip

mourisl commented 2 years ago

Thanks for sharing the files. The list is a text file where each row is the gtf fiile you want to add the gene names. So in your case, you can create a file tmp.txt with one line "psiclass.gtf", and then you need to create the directory "output" for the output files and then run "./add-genename reference.gtf tmp.gtf -o output". You shall find a new file psiclass.gtf in the folder output.

I just added the option "-f" to the add-genename program on github repo, and you can use "-f gene_id" to use the gene_id field in the reference.gtf file. Though in the output file, it will still be called "gene_name" to avoid the confusion on the existed "gene_id" field in psiclass's output.

Does this help? Thank you.