nanoporetech / pinfish

Tools to annotate genomes using long read transcriptomics data
Other
44 stars 13 forks source link

the same full length sequences but in different order results different result #24

Closed pynie1 closed 4 years ago

pynie1 commented 4 years ago

Hi, I have a fasta file that contains some full length sequences.When I adjusted the order of full sequences in the file. I got the different clusters.tsv file.The command line as follow: spliced_bam2gff -s -M CON1.sorted.bam > CON1.raw_transcripts.gff cluster_gff -c 10 -a CON1.clusters.tsv CON1.raw_transcripts.gff > CON1.clustered_transcripts.gff

I don't know why and I need your help.Could you help me? If you need my test file, I can provide.

bsipos commented 4 years ago

Did you sort the BAM file? If you did not then this behaviour is expected. Also, the results will be wrong.

pynie1 commented 4 years ago

Yes, I sorted the BAM file.

bsipos commented 4 years ago

If you sort the BAM file, then the GFF records will always be in the same order, hence the output should be the same. Please use the snakemake pipeline to avoid potential glitches.