marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

samtools view: failed to add PG line to the header #68

Closed YanCheer closed 10 months ago

YanCheer commented 11 months ago

When I try with the test data as follows: python $ToolDir/MitoHiFi/src/mitohifi.py -a animal -r /projects/yanchen/softwares/MitoHiFi/tests/ilDeiPorc1.reads.100.fa -f $OutDir/OQ694980.1.fasta -g $OutDir/OQ694980.1.gb -t 36 -d -o 5

2023-12-12 10:52:40 [INFO] Reads mapping: 2023-12-12 10:52:40 [INFO] minimap2 -t 36 --secondary=no -ax map-pb all_potential_contigs.fa gbk.HiFiMapped.bam.fasta | samtools view -@ 36 -b -F4 -F 0x800 -q 0 -o HiFi-vs-potential_contigs.bam [W::sam_hdr_create] Duplicated sequence "ptg000001l_rc_rotated" in file "-" [E::sam_hrecs_update_hashes] Duplicate entry "ptg000001l_rc_rotated" in sam header samtools view: failed to add PG line to the header

Although it successfully produced final_mitogenome.fasta and final_mitogenome.gb, the pipeline got stuck at this point and not went further any more.

Is this enough to have produced the above two final files? Or does I have to fix the problem with samtools to obtain the complete and fully finished results?

leiyuyan commented 10 months ago

How to solve it?

marcelauliano commented 10 months ago

Hi Yan, if you are using a local installation its difficult for me to debug, but make sure you are using the version of samtools we specify on the readme.

yshcai commented 10 months ago

I also meet this problem -- samtools view: failed to add PG line to the header. I created a conda environment with your yml file that is inside MitoHiFi/environment after the MitoFinder was installed and made accessible via the PATH environment variable, so the version of samtools is 1.11.

marcelauliano commented 10 months ago

Hi sorry, we unfortunately cannot provide help with debugging for manual installations. Please use the docker container (in docker or with singularity) which passed all tests only a few weeks ago.

Best, Marcela

Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute

Darwin Tree of Life Project (DToL)

Churchill College Postdoctoral By-Fellow, University of Cambridge

Cambridge, UK

On Tue, 16 Jan 2024 at 07:17, Enosh @.***> wrote:

I also meet this problem -- samtools view: failed to add PG line to the header. I created a conda environment with your yml file that is inside MitoHiFi/environment after the MitoFinder was installed and made accessible via the PATH environment variable, so the version of samtools is 1.11.

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/68#issuecomment-1893132021, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5REHLXZ762DH4KPAACDYOYLOPAVCNFSM6AAAAABAQYN77GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGEZTEMBSGE . You are receiving this because you modified the open/close state.Message ID: @.***>

yshcai commented 10 months ago

Hi sorry, we unfortunately cannot provide help with debugging for manual installations. Please use the docker container (in docker or with singularity) which passed all tests only a few weeks ago. Best, Marcela Marcela Uliano da Silva, PhD Senior Bioinformatician - Wellcome Sanger Institute Darwin Tree of Life Project (DToL) Churchill College Postdoctoral By-Fellow, University of Cambridge Cambridge, UK On Tue, 16 Jan 2024 at 07:17, Enosh @.> wrote: I also meet this problem -- samtools view: failed to add PG line to the header. I created a conda environment with your yml file that is inside MitoHiFi/environment after the MitoFinder was installed and made accessible via the PATH environment variable, so the version of samtools is 1.11. — Reply to this email directly, view it on GitHub <#68 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5REHLXZ762DH4KPAACDYOYLOPAVCNFSM6AAAAABAQYN77GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGEZTEMBSGE . You are receiving this because you modified the open/close state.Message ID: @.>

Thank you for your help! I think files from the git clone may have some errors. I download the release from MitoHiFi_v3.2, and then run it locally, the error was gone.

AlcaArctica commented 9 months ago

Hmm, I just encountered the same error. Its stuck at samtools view: failed to add PG line to the header (now for several hours). I also used git clone, so will try the suggestion of @yshcai to see whether this helps.

python MitoHiFi/src/mitohifi.py -r ilDeiPorc1.reads.100.fa -f NC_057303.1.fasta -g NC_057303.1.gb -t 24 -o 5 

2024-02-14 14:05:52 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2024-02-14 14:05:52 [INFO] Length of related mitogenome is: 15312 bp
2024-02-14 14:05:52 [INFO] Number of genes on related mitogenome: 37
2024-02-14 14:05:52 [INFO] Running MitoHifi pipeline in reads mode...
2024-02-14 14:05:52 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome
2024-02-14 14:05:52 [INFO] minimap2 -t 24 --secondary=no -ax map-hifi NC_057303.1.fasta ilDeiPorc1.reads.100.fa | samtools view -@ 24 -b -F4 -F 0x800 -o reads.HiFiMapped.bam
2024-02-14 14:05:52 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS
2024-02-14 14:05:52 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format:
2024-02-14 14:05:52 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta
2024-02-14 14:05:52 [INFO] Total number of mapped reads: 97
2024-02-14 14:05:52 [INFO] 2.2 Then we filter reads that are larger than 15312 bp
2024-02-14 14:05:52 [INFO] Number of filtered reads: 87
2024-02-14 14:05:52 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads!
2024-02-14 14:05:52 [INFO] hifiasm --primary -t 24 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta
cat: gbk.HiFiMapped.bam.filtered.assembled.a_ctg.fa: No such file or directory
2024-02-14 14:05:52 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome
2024-02-14 14:05:52 [INFO] 4.1. Creating BLAST database:
2024-02-14 14:05:52 [INFO] makeblastdb -in NC_057303.1.fasta -dbtype nucl
2024-02-14 14:06:00 [INFO] Makeblastdb done.
2024-02-14 14:06:00 [INFO] 4.2. Running blast of contigs against close-related mitogenome:
2024-02-14 14:06:00 [INFO] blastn -query hifiasm.contigs.fasta -db NC_057303.1.fasta -num_threads 24 -out contigs.blastn -outfmt 6 std qlen slen
2024-02-14 14:06:08 [INFO] Blast done.
2024-02-14 14:06:08 [INFO] 5. Filtering BLAST output to select target sequences
2024-02-14 14:06:08 [INFO] Filtering thresholds applied:
2024-02-14 14:06:08 [INFO] Minimum query percentage = 50
2024-02-14 14:06:08 [INFO] Minimum query length = 80% subject length
2024-02-14 14:06:08 [INFO] Maximum query length = 5 times subject length
2024-02-14 14:06:08 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file
2024-02-14 14:06:08 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2024-02-14 14:06:08 [INFO] Annotation will be done using MitoFinder (default)
2024-02-14 14:06:08 [INFO] Working with contig ptg000001l
2024-02-14 14:06:08 [INFO] Started ptg000001l circularization
2024-02-14 14:06:36 [INFO] ptg000001l circularization done. Circularization info saved on ./potential_contigs/ptg000001l/ptg000001l.circularisationCheck.txt
2024-02-14 14:06:36 [INFO] Started ptg000001l (MitoFinder) annotation
2024-02-14 14:10:02 [INFO] ptg000001l annotation done. Annotation log saved on ./potential_contigs/ptg000001l/ptg000001l.annotation_MitoFinder.log
2024-02-14 14:10:02 [INFO] Started ptg000001l rotation.
2024-02-14 14:10:02 [INFO] tRNA-Phe is at reverse complement of ptg000001l.mitogenome.fa
2024-02-14 14:10:02 [INFO] For that reason we'll reverse complement ptg000001l.mitogenome.fa before the rotation
2024-02-14 14:10:02 [INFO] Reverse complementation completed for contig ptg000001l done
2024-02-14 14:10:02 [INFO] Rotation of ptg000001l done. Rotated is at ptg000001l.mitogenome.rotated.fa
2024-02-14 14:10:02 [INFO] Started calculating mitocontig stats... ptg000001l
2024-02-14 14:10:02 [INFO] 7. Now the rotated contigs will be aligned
2024-02-14 14:10:02 [INFO] List of contigs that will be aligned: ['ptg000001l.mitogenome.rotated.fa']

2024-02-14 14:10:02 [INFO] MAFFT alignment will be called with:
mafft --quiet --clustalout --thread 24 all_mitogenomes.rotated.fa > all_mitogenomes.rotated.aligned.aln

2024-02-14 14:10:02 [INFO] Alignment done and saved at ./final_mitogenome_choice/all_mitogenomes.rotated.aligned.aln

2024-02-14 14:10:02 [INFO] 8. Now we will choose the most representative contig

2024-02-14 14:10:03 [INFO] Representative contig is ptg000001l that belongs to Cluster 0. This contig will be our final mitogenome. See all contigs and clusters in cdhit.out.clstr
2024-02-14 14:10:03 [INFO] 9. Calculating final stats for final mitogenome and other potential contigs.
    Stats will be saved on contigs_stats.tsv file.
ptg000001l list of genes: ['tRNA-Phe', 'tRNA-Glu', 'tRNA-Ser', 'tRNA-Asn', 'tRNA-Arg', 'tRNA-Ala', 'ND3', 'tRNA-Gly', 'COX3', 'ATP6', 'ATP8', 'tRNA-Asp', 'tRNA-Lys', 'COX2', 'tRNA-Leu2', 'COX1', 'tRNA-Tyr', 'tRNA-Cys', 'tRNA-Trp', 'ND2', 'tRNA-Gln', 'tRNA-Ile', 'tRNA-Met', 'rrnS', 'tRNA-Val', 'rrnL', 'tRNA-Leu', 'ND1', 'tRNA-Ser2', 'CYTB', 'ND6', 'tRNA-Pro', 'tRNA-Thr', 'ND4L', 'ND4', 'tRNA-His', 'ND5']
2024-02-14 14:10:03 [INFO] 10. Building annotation plots for all contigs
2024-02-14 14:10:40 [INFO] 11. Building coverage distribution for each potential contig
2024-02-14 14:10:40 [INFO] contigs_to_map: ['final_mitogenome.fasta', 'ptg000001l.mitogenome.rotated.fa']
2024-02-14 14:10:40 [INFO] 11.1 Mapping HiFi (filtered) reads against potential contigs:
2024-02-14 14:10:40 [INFO] Reads mapping:
2024-02-14 14:10:40 [INFO] minimap2 -t 24 --secondary=no -ax map-pb all_potential_contigs.fa gbk.HiFiMapped.bam.fasta | samtools view -@ 24 -b -F4 -F 0x800 -q 0 -o HiFi-vs-potential_contigs.bam
[W::sam_hdr_create] Duplicated sequence "ptg000001l_rc_rotated" in file "-"
[E::sam_hrecs_update_hashes] Duplicate entry "ptg000001l_rc_rotated" in sam header
samtools view: failed to add PG line to the header
marcelauliano commented 9 months ago

samtools version most likely

marcelauliano commented 9 months ago

But as your log says, your mitogenome is already assembled. The problem is only with the plotting

2024-02-14 14:10:40 [INFO] 11. Building coverage distribution for each potential contig 2024-02-14 14:10:40 [INFO] contigs_to_map: ['final_mitogenome.fasta', 'ptg000001l.mitogenome.rotated.fa']

You can inspect your results. Please refer to readme for intermediate files interpretation.

Best of luck :)

Em qua., 14 de fev. de 2024 às 13:20, Laura Uelze @.***> escreveu:

Hmm, I just encountered the same error. Its stuck at samtools view: failed to add PG line to the header (now for several hours). I also used git clone, so will try the suggestion of @yshcai https://github.com/yshcai to see whether this helps.

python MitoHiFi/src/mitohifi.py -r ilDeiPorc1.reads.100.fa -f NC_057303.1.fasta -g NC_057303.1.gb -t 24 -o 5

2024-02-14 14:05:52 [INFO] Welcome to MitoHifi v2. Starting pipeline... 2024-02-14 14:05:52 [INFO] Length of related mitogenome is: 15312 bp 2024-02-14 14:05:52 [INFO] Number of genes on related mitogenome: 37 2024-02-14 14:05:52 [INFO] Running MitoHifi pipeline in reads mode... 2024-02-14 14:05:52 [INFO] 1. First we map your Pacbio HiFi reads to the close-related mitogenome 2024-02-14 14:05:52 [INFO] minimap2 -t 24 --secondary=no -ax map-hifi NC_057303.1.fasta ilDeiPorc1.reads.100.fa | samtools view -@ 24 -b -F4 -F 0x800 -o reads.HiFiMapped.bam 2024-02-14 14:05:52 [INFO] 2. Now we filter out any mapped reads that are larger than the reference mitogenome to avoid NUMTS 2024-02-14 14:05:52 [INFO] 2.1 First we convert the mapped reads from BAM to FASTA format: 2024-02-14 14:05:52 [INFO] samtools fasta reads.HiFiMapped.bam > gbk.HiFiMapped.bam.fasta 2024-02-14 14:05:52 [INFO] Total number of mapped reads: 97 2024-02-14 14:05:52 [INFO] 2.2 Then we filter reads that are larger than 15312 bp 2024-02-14 14:05:52 [INFO] Number of filtered reads: 87 2024-02-14 14:05:52 [INFO] 3. Now let's run hifiasm to assemble the mapped and filtered reads! 2024-02-14 14:05:52 [INFO] hifiasm --primary -t 24 -f 0 -o gbk.HiFiMapped.bam.filtered.assembled gbk.HiFiMapped.bam.filtered.fasta cat: gbk.HiFiMapped.bam.filtered.assembled.a_ctg.fa: No such file or directory 2024-02-14 14:05:52 [INFO] 4. Let's run the blast of the contigs versus the close-related mitogenome 2024-02-14 14:05:52 [INFO] 4.1. Creating BLAST database: 2024-02-14 14:05:52 [INFO] makeblastdb -in NC_057303.1.fasta -dbtype nucl 2024-02-14 14:06:00 [INFO] Makeblastdb done. 2024-02-14 14:06:00 [INFO] 4.2. Running blast of contigs against close-related mitogenome: 2024-02-14 14:06:00 [INFO] blastn -query hifiasm.contigs.fasta -db NC_057303.1.fasta -num_threads 24 -out contigs.blastn -outfmt 6 std qlen slen 2024-02-14 14:06:08 [INFO] Blast done. 2024-02-14 14:06:08 [INFO] 5. Filtering BLAST output to select target sequences 2024-02-14 14:06:08 [INFO] Filtering thresholds applied: 2024-02-14 14:06:08 [INFO] Minimum query percentage = 50 2024-02-14 14:06:08 [INFO] Minimum query length = 80% subject length 2024-02-14 14:06:08 [INFO] Maximum query length = 5 times subject length 2024-02-14 14:06:08 [INFO] Filtering BLAST finished. A list of the filtered contigs was saved on ./contigs_filtering/contigs_ids.txt file 2024-02-14 14:06:08 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s). 2024-02-14 14:06:08 [INFO] Annotation will be done using MitoFinder (default) 2024-02-14 14:06:08 [INFO] Working with contig ptg000001l 2024-02-14 14:06:08 [INFO] Started ptg000001l circularization 2024-02-14 14:06:36 [INFO] ptg000001l circularization done. Circularization info saved on ./potential_contigs/ptg000001l/ptg000001l.circularisationCheck.txt 2024-02-14 14:06:36 [INFO] Started ptg000001l (MitoFinder) annotation 2024-02-14 14:10:02 [INFO] ptg000001l annotation done. Annotation log saved on ./potential_contigs/ptg000001l/ptg000001l.annotation_MitoFinder.log 2024-02-14 14:10:02 [INFO] Started ptg000001l rotation. 2024-02-14 14:10:02 [INFO] tRNA-Phe is at reverse complement of ptg000001l.mitogenome.fa 2024-02-14 14:10:02 [INFO] For that reason we'll reverse complement ptg000001l.mitogenome.fa before the rotation 2024-02-14 14:10:02 [INFO] Reverse complementation completed for contig ptg000001l done 2024-02-14 14:10:02 [INFO] Rotation of ptg000001l done. Rotated is at ptg000001l.mitogenome.rotated.fa 2024-02-14 14:10:02 [INFO] Started calculating mitocontig stats... ptg000001l 2024-02-14 14:10:02 [INFO] 7. Now the rotated contigs will be aligned 2024-02-14 14:10:02 [INFO] List of contigs that will be aligned: ['ptg000001l.mitogenome.rotated.fa']

2024-02-14 14:10:02 [INFO] MAFFT alignment will be called with: mafft --quiet --clustalout --thread 24 all_mitogenomes.rotated.fa > all_mitogenomes.rotated.aligned.aln

2024-02-14 14:10:02 [INFO] Alignment done and saved at ./final_mitogenome_choice/all_mitogenomes.rotated.aligned.aln

2024-02-14 14:10:02 [INFO] 8. Now we will choose the most representative contig

2024-02-14 14:10:03 [INFO] Representative contig is ptg000001l that belongs to Cluster 0. This contig will be our final mitogenome. See all contigs and clusters in cdhit.out.clstr 2024-02-14 14:10:03 [INFO] 9. Calculating final stats for final mitogenome and other potential contigs. Stats will be saved on contigs_stats.tsv file. ptg000001l list of genes: ['tRNA-Phe', 'tRNA-Glu', 'tRNA-Ser', 'tRNA-Asn', 'tRNA-Arg', 'tRNA-Ala', 'ND3', 'tRNA-Gly', 'COX3', 'ATP6', 'ATP8', 'tRNA-Asp', 'tRNA-Lys', 'COX2', 'tRNA-Leu2', 'COX1', 'tRNA-Tyr', 'tRNA-Cys', 'tRNA-Trp', 'ND2', 'tRNA-Gln', 'tRNA-Ile', 'tRNA-Met', 'rrnS', 'tRNA-Val', 'rrnL', 'tRNA-Leu', 'ND1', 'tRNA-Ser2', 'CYTB', 'ND6', 'tRNA-Pro', 'tRNA-Thr', 'ND4L', 'ND4', 'tRNA-His', 'ND5'] 2024-02-14 14:10:03 [INFO] 10. Building annotation plots for all contigs 2024-02-14 14:10:40 [INFO] 11. Building coverage distribution for each potential contig 2024-02-14 14:10:40 [INFO] contigs_to_map: ['final_mitogenome.fasta', 'ptg000001l.mitogenome.rotated.fa'] 2024-02-14 14:10:40 [INFO] 11.1 Mapping HiFi (filtered) reads against potential contigs: 2024-02-14 14:10:40 [INFO] Reads mapping: 2024-02-14 14:10:40 [INFO] minimap2 -t 24 --secondary=no -ax map-pb all_potential_contigs.fa gbk.HiFiMapped.bam.fasta | samtools view -@ 24 -b -F4 -F 0x800 -q 0 -o HiFi-vs-potential_contigs.bam [W::sam_hdr_create] Duplicated sequence "ptg000001l_rc_rotated" in file "-" [E::sam_hrecs_update_hashes] Duplicate entry "ptg000001l_rc_rotated" in sam header samtools view: failed to add PG line to the header

— Reply to this email directly, view it on GitHub https://github.com/marcelauliano/MitoHiFi/issues/68#issuecomment-1943755553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7M5RAYRC5I3O4NSDWDN6LYTS23HAVCNFSM6AAAAABAQYN77GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTG42TKNJVGM . You are receiving this because you modified the open/close state.Message ID: @.***>

-- Marcela Uliano da Silva, PhD

Senior Bioinformatician - Wellcome Sanger Institute

Darwin Tree of Life Project (DToL)

Churchill College Postdoctoral By-Fellow, University of Cambridge

Cambridge, UK

AlcaArctica commented 9 months ago

Following @yshcai suggestion solved the problem. Thank you!