marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

The error (Duplicated sequence "ptg000001l_rc_rotated" in file "-") occured when i test MitoHiFI #99

Closed HydrOpOtesJA99 closed 3 months ago

HydrOpOtesJA99 commented 3 months ago

Hello,

I am encountering an issue while testing MitoHiFi with the provided test data (reference: https://github.com/marcelauliano/MitoHiFi). The error message I receive is:

""" [W::sam_hdr_create] Duplicated sequence "ptg000001l_rc_rotated" in file "-" [E::sam_hrecs_update_hashes] Duplicate entry "ptg000001l_rc_rotated" in sam header samtools view: failed to add PG line to the header """

I compared my analysis log with a YouTube video (https://www.youtube.com/watch?v=1NWHC2zkRmg) and noticed a difference, which you can see in the image below (YouTube, 49:52):

Youtube capture image (49:52) : image

From my analysis log(you can also check full log from added file 000_logfile.txt 000_logfile.txt ): """ ~ ptg000001l list of genes: ['tRNA-Phe', 'tRNA-Glu', 'tRNA-Ser2', 'tRNA-Asn', 'tRNA-Arg', 'tRNA-Ala', 'ND3', 'tRNA-Gly', 'COX3', 'ATP6', 'tRNA-Asp', 'tRNA-Lys', 'COX2', 'tRNA-Leu2', 'COX1', 'tRNA-Tyr', 'tRNA-Cys', 'tRNA-Ter', 'ND2', 'tRNA-Gln', 'tRNA-Ile', 'tRNA-Met', 'rrnS', 'tRNA-Val', 'rrnL', 'tRNA-Leu', 'ND1', 'tRNA-Ser', 'CYTB', 'ND6', 'tRNA-Pro', 'tRNA-Thr', 'ND4L', 'ND4', 'tRNA-His', 'ND5'] 2024-08-16 11:23:44 [INFO] 10. Building annotation plots for all contigs 2024-08-16 11:23:45 [INFO] 11. Building coverage distribution for each potential contig 2024-08-16 11:23:45 [INFO] contigs_to_map: ['final_mitogenome.fasta', 'ptg000001l.mitogenome.rotated.fa'] 2024-08-16 11:23:45 [INFO] 11.1 Mapping HiFi (filtered) reads against potential contigs: 2024-08-16 11:23:45 [INFO] Reads mapping: 2024-08-16 11:23:45 [INFO] minimap2 -t 4 --secondary=no -ax map-pb all_potential_contigs.fa gbk.HiFiMapped.bam.fasta | samtools view -@ 4 -b -F4 -F 0x800 -q 0 -o HiFi-vs-potential_contigs.bam [W::sam_hdr_create] Duplicated sequence "ptg000001l_rc_rotated" in file "-" [E::sam_hrecs_update_hashes] Duplicate entry "ptg000001l_rc_rotated" in sam header samtools view: failed to add PG line to the header """

It appears that I have two contigs_to_map files that differ from those in the video. Upon comparison, both files (final_mitogenome.fasta and ptg000001l.mitogenome.rotated.fa) contain identical sequence IDs and sequences. This seems to be causing the samtools duplicate entry error.

Below is a list of the software and modules I have installed for MitoHiFi, along with their versions:

Python: 3.10.13 (recommended: 3.7) samtools: 1.13 (recommended: 1.11) cd-hit: 4.8.1 (recommended: 4.8.1) minimap2: 2.24-r1122 (recommended: 2.19) hifiasm: 0.19.9-r616 (recommended: 0.19.5) mafft: v7.525 (recommended: 7.520) biopython: 1.84 (recommended: 1.79) matplotlib: 3.9.2 (recommended: 3.2.2) dna_feature_viewer: 3.1.3 (recommended: 3.1.2) pandas: 2.0.3 (recommended: 1.3.5) bedtools: v2.31.0 (recommended: 2.31.0) Pillow: 9.4.0 (recommended: 6.2.1) bcbio-gff: 0.7.1 (recommended: 0.7.0) MitoFinder: 1.4.2 (recommended: 1.4.0) mitos: 2.1.9 (recommended: 2.1.0) ncbi-blast+: Nucleotide-Nucleotide BLAST 2.12.0+ (recommended: NaN)

Could you please assist me in resolving this issue? Any guidance or suggestions would be greatly appreciated.

Thank you very much for your help!

Best regards, Sangjin

marcelauliano commented 3 months ago

Hi Sangjin, if the error is the same IDs, try to start with a fasta file that has different contig names. Best regards, M.