marcelauliano / MitoHiFi

Find, circularise and annotate mitogenome from PacBio assemblies
MIT License
169 stars 29 forks source link

blast xml error #11

Closed Brent-Saylor-Canopy closed 1 year ago

Brent-Saylor-Canopy commented 3 years ago

Hi I am trying to run mitohifi on a plant assembly, but I'm getting the following error with what looks like the same specific contigs.

I tried modifying the mitofinder command at the top of the log files for other contigs but the *.mitogenome.fa file used in the command looks to be missing.

Any idea what might be causing this issue?


/opt/MitoHiFi/mitohifi_v2.py:240: UserWarning: Contig h1tg000013l does not have an annotation file, check MitoFinder's output
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's output")

Working with contig: h1tg000131l

isNeg? FalseFiltering and writing...
1000 records done
1 out of 1708 sequences remained after filtering.
Error: [blastn] Failed s_BlastXMLAddIteration Q(0/1
Traceback (most recent call last):
  File "/opt/MitoHiFi/mitohifi_v2.py", line 333, in <module>
    main()
  File "/opt/MitoHiFi/mitohifi_v2.py", line 230, in main
    circularization_history = get_circo_mito(contig_id, args.circular_size, args.circular_offset)
  File "/opt/MitoHiFi/mitohifi_v2.py", line 63, in get_circo_mito
    circularization_info = circularizationCheck(resultFile=contig_fasta, circularSize=circular_size, circularOffSet=circular_offset)
  File "/opt/MitoHiFi/circularizationCheck.py", line 75, in circularizationCheck
    for qresult in blastparse: #in each query...
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/site-packages/Bio/SearchIO/__init__.py", line 306, in parse
    yield from generator
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 240, in __iter__
    yield from self._parse_qresult()
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 289, in _parse_qresult
    for event, qresult_elem in self.xml_iter:
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/xml/etree/ElementTree.py", line 1221, in iterator
    yield from pullparser.read_events()
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/xml/etree/ElementTree.py", line 1296, in read_events
    raise event
  File "/home/cgc-genomics-admin/miniconda3/envs/Mitohifi3.6_env/lib/python3.6/xml/etree/ElementTree.py", line 1268, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: mismatched tag: line 29, column 2
jgnunes commented 3 years ago

Hi @Brent-Saylor-Canopy Thansk for reaching out here. Has MitoHiFi produced *.mitogenome.fa files for other contigs? Can you provide some additional info?

Best,

Brent-Saylor-Canopy commented 3 years ago

Hi @jgnunes ,

Thanks for the quick reply. I didn't properly save the logs of the first run, so I reran and I'm attaching the complete stdout output as well as a screenshot of the directory after the error. The error was on h1tg000174l on this run. The running command was:

time python /opt/MitoHiFi/mitohifi_v2.py -c ../CGC_10.HiFiasm_HiC_Run1.D10.hic.hap1.p_ctg.fasta -f data/MT361980.1.fasta -g data/MT361980.1.gb -t 20 -o 1

MitoHiFi_dir MitoHifi.log

I also didn't ever notice that the system was using more than one processor. I'm not sure if that's normal for this part of the script or not, but I thought it might be worth mentioning.

jgnunes commented 3 years ago

Hi @Brent-Saylor-Canopy from the log file, it looks like the problem is caused by some genes from the reference genome not being recognized by mitofinder (which MitoHiFi uses to do the annotation) as "standard mithocondrial genes". This is something I've seen for other plants and we'll be (soon) releasing a new version to deal with that (it will have a specific flag for dealing with plants). In the meantime, if you don't want to wait for the new release you can modify mitohifi_v2.py code at lines 236 and 286 to include the --new-genes parameter at the mitofinder execution (you can put it before the "max-contig-size" parameter).

If you do that, let me know.

Finally, can you share the circularization_check.blast.xml and circularization_check.blast.tsv files? It's intriguing why you also had a ParseError at the BLAST step.

Brent-Saylor-Canopy commented 3 years ago

Ok, I have made the modificaion you suggested and I'll let you know how it goes.

Here are the files you requested Mitohifi_Files.zip

Brent-Saylor-Canopy commented 3 years ago

The script got much further this time. I've attached the new log. It still looks like its getting hung up on something related to the xml error.

Mitohifi_modified_run_1.log

jgnunes commented 3 years ago

Hi @Brent-Saylor-Canopy. It looks like some blast error indeed. Haven't seem this before. Could you please share the problematic contig with me (h1tg000174l.mito.fa)? I need to reproduce the problem in my environment to try to come up with a solution.

Best regards and sorry for the delay

Brent-Saylor-Canopy commented 3 years ago

Sure no problem h1tg000174l.mito.zip

jgnunes commented 3 years ago

Hi @Brent-Saylor-Canopy

I've updated the circularizationCheck.py script to try to catch your case. It looks like h1tg000174l is a long contig (>1Mb) and that caused BLAST to fail when trying to output in XML format. The new script should (hopefully) work fine. So I would ask you to replace that script and try to run it again. Let me know if that solves (or not) the issue.

Best,

SamCT commented 5 months ago

Hello,

Is there a fix for this? It isn't a blast XML error but it is of the same category:

2024-06-15 09:25:03 [INFO] Mlong585a.ctg85.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg85.v4/Mlong585a.ctg85.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg85.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2024-06-15 09:25:03 [INFO] Mlong585a.ctg58.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg58.v4/Mlong585a.ctg58.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg58.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2024-06-15 09:25:04 [INFO] Mlong585a.ctg221.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg221.v4/Mlong585a.ctg221.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg221.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2024-06-15 09:25:04 [INFO] Mlong585a.ctg21.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg21.v4/Mlong585a.ctg21.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg21.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2024-06-15 09:25:04 [INFO] Mlong585a.ctg20.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg20.v4/Mlong585a.ctg20.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg20.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
2024-06-15 09:25:07 [INFO] Mlong585a.ctg18.v4 circularization done. Circularization info saved on ./potential_contigs/Mlong585a.ctg18.v4/Mlong585a.ctg18.v4.circularisationCheck.txt
2024-06-15 09:25:07 [INFO] Started Mlong585a.ctg18.v4 (MitoFinder) annotation
2024-06-15 09:25:08 [INFO] Mlong585a.ctg18.v4 annotation done. Annotation log saved on ./potential_contigs/Mlong585a.ctg18.v4/Mlong585a.ctg18.v4.annotation_MitoFinder.log
/opt/MitoHiFi/src/parallel_annotation.py:69: UserWarning: Contig Mlong585a.ctg18.v4 does not have an annotation file, check MitoFinder's log
  warnings.warn("Contig "+ contig_id + " does not have an annotation file, check MitoFinder's log")
Traceback (most recent call last):
  File "/opt/MitoHiFi/src/mitohifi.py", line 567, in <module>
    main()
  File "/opt/MitoHiFi/src/mitohifi.py", line 304, in main
    tRNA_ref = fetch.get_ref_tRNA()
  File "/opt/MitoHiFi/src/fetch.py", line 48, in get_ref_tRNA
    reference_tRNA = max(tRNAs, key=tRNAs.get)
ValueError: max() arg is an empty sequence

Thanks, Sam