paulstothard / cgview_comparison_tool

The CGView Comparison Tool (CCT) is a package for visually comparing bacterial, plasmid, chloroplast, and mitochondrial sequences.
https://paulstothard.github.io/cgview_comparison_tool/
GNU General Public License v3.0
30 stars 18 forks source link

CGview Java org.xml.sax.SAXException #10

Closed BachBioinformatics closed 1 year ago

BachBioinformatics commented 2 years ago

Trying to do redraw_maps.sh -p myproject -f svg or build_blast_atlas.sh -p myproject -m 48g but i am getting the same following error :

org.xml.sax.SAXException: value for 'start' attribute in featureRange element must be less than or equal to the length of the plasmid in null at line 52 column 48
        at ca.ualberta.stothard.cgview.CgviewFactory.handleFeatureRange(CgviewFactory.java:3570)
        at ca.ualberta.stothard.cgview.CgviewFactory.startElement(CgviewFactory.java:669)
        at org.apache.xerces.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:497)
        at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:180)
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:275)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1654)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
        at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845)
        at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768)
        at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1201)
        at ca.ualberta.stothard.cgview.CgviewFactory.createCgviewFromFile(CgviewFactory.java:445)
        at ca.ualberta.stothard.cgview.CgviewIO.main(CgviewIO.java:1474)
The following error occurred: org.xml.sax.SAXException: value for 'start' attribute in featureRange element must be less than or equal to the length of the plasmid in null at line 52 column 48

Any tips to fix this error? Many thanks

paulstothard commented 2 years ago

Hello,

That error means that CGView has detected a sequence feature in the input that has a start position that is larger than the length of the sequence being drawn.

I would need to see the input files and full command to identify the feature causing the error.

Paul

On Sun, Oct 16, 2022 at 10:15 PM BachBioinformatics < @.***> wrote:

Trying to do redraw_maps.sh -p myproject -f svg or build_blast_atlas.sh -p myproject -m 48g but i am getting the same following error :

org.xml.sax.SAXException: value for 'start' attribute in featureRange element must be less than or equal to the length of the plasmid in null at line 52 column 48 at ca.ualberta.stothard.cgview.CgviewFactory.handleFeatureRange(CgviewFactory.java:3570) at ca.ualberta.stothard.cgview.CgviewFactory.startElement(CgviewFactory.java:669) at org.apache.xerces.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:497) at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:180) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:275) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1654) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768) at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108) at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1201) at ca.ualberta.stothard.cgview.CgviewFactory.createCgviewFromFile(CgviewFactory.java:445) at ca.ualberta.stothard.cgview.CgviewIO.main(CgviewIO.java:1474) The following error occurred: org.xml.sax.SAXException: value for 'start' attribute in featureRange element must be less than or equal to the length of the plasmid in null at line 52 column 48

Any tips?

Many thanks

— Reply to this email directly, view it on GitHub https://github.com/paulstothard/cgview_comparison_tool/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL64CMYJBW6FILDOYJIEQVDWDTHFLANCNFSM6AAAAAARGVM2SY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dr. Paul Stothard - Professor Department of Agricultural, Food & Nutritional Science (AFNS) University of Alberta Edmonton, Alberta T6G 2C8 Canada

@. @. https://sites.ualberta.ca/~stothard/ office: 2-31 General Services Bldg phone: 1.780.492.5242 mobile: 1.780.297.5242

BachBioinformatics commented 2 years ago

Hello Paul Thanks a lot for your reply,

I am familiar with gview webserver, but I could not use it https://server.gview.ca/ to do core analysis because the .gbk sizes of my files are huge and uploading them to the server could be a very slow journey

In fact, I have a set of contigs from a clinical sample assembled with spades and annotated with prokka, the produced GenBank file from gff prokka annotations is relatively big(>15 G). This file was copied into the subfolder comparison_genome

A reference was built from SRA with the same tools(spades -> prokka-> gff annotations converted to GenBank format). This file was automatically used as input by CGView Comparison Tool and copied to the subfolder reference_genome of the created project

So I installed the CGView Comparison Tool locally on my server

1- step 1 : create my project folder using a ref input in Genbank format

build_blast_atlas.sh -i Refspades_contigs.gbk cp myclinicalsample_contigs.gbk Refspades_contigs/comparison_genone cp myclinicalsample_contigs.fasta Refspades_contigs/comparison_genone

2- step 2 So I got the previous error mentioned in the first comment after running the following command build_blast_atlas.sh -p spades_contigs -m 48g

Failed after running this : java -Djava.awt.headless=true -jar -Xmx48g -jar /opt/software/cgview_comparison_tool/bin/cgview/cgview.jar -i 'Refspades_contigs/cct_projects/dna_vs_dna/maps/cgview_xml/dna_vs_dna_large.xml' -f png -o 'Refspades_contigs/cct_projects/dna_vs_dna/maps/dna_vs_dna_large.png' -h 'Refspades_contigs/cct_projects/dna_vs_dna/maps/dna_vs_dna_large.html' -p 'dna_vs_dna_large.png'

the Refspades_contigs/cct_projects/dna_vs_dna/maps/cgview_xml/dna_vs_dna_large.xml file is higher than 8 G.

on line 52, I can observe something like this

<feature color="rgb(0,0,153)" decoration="clockwise-arrow" opacity="0.5" label="CAKDEMEL_00083" mouseover="CAKDEMEL_00083; 9711309 to 9712295; hypothetical protein" >
<featureRange start="9711309" stop="9712295" />

but I cannot see the length feature I can send you a compressed version of dna_vs_dna_large.xml file, not sure i can upload it here

paulstothard commented 2 years ago

Hello,

The error would be triggered by the contents of Refspades_contigs.gbk. How big is that file?

If you send it I could try to look for the issue.

Paul

On Mon, Oct 17, 2022 at 12:32 PM @@. @.> wrote:

Hello, Thanks for your reply,

Shall I edit the XML file?

In fact, I have a set of contigs from a clinical sample assembled with spades and annotated with prokka, the produced GenBank file from gff prokka annotations is relatively big(>15 G)

A reference was built from SRA in the same tools(spades -> prokka-> annotations GenBank format) I could not use directly the server gview https://server.gview.ca/ to do core analysis as i get used to because the .gbk sizes here are huge and uploading them to the given server could be very slow journey

So I installed the CGView Comparison Tool locally on my server

1- step 1 : create my project folder from a ref GenBank input

build_blast_atlas.sh -i Refspades_contigs.gbk cp myclinicalsample_contigs.gbk Refspades_contigs /comparison_genone

2- step 2 So I got the previous error mentioned in the first comment after running the following command build_blast_atlas.sh -p spades_contigs -m 48g

java -Djava.awt.headless=true -jar -Xmx48g -jar /opt/software/cgview_comparison_tool/bin/cgview/cgview.jar -i 'Refspades_contigs/cct_projects/dna_vs_dna/maps/cgview_xml/dna_vs_dna_large.xml' -f png -o 'Refspades_contigs/cct_projects/dna_vs_dna/maps/dna_vs_dna_large.png' -h 'Refspades_contigs/cct_projects/dna_vs_dna/maps/dna_vs_dna_large.html' -p 'dna_vs_dna_large.png'

— Reply to this email directly, view it on GitHub https://github.com/paulstothard/cgview_comparison_tool/issues/10#issuecomment-1281306096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL64CM4NBX6KXUVWCDEAG2DWDWLU5ANCNFSM6AAAAAARGVM2SY . You are receiving this because you commented.Message ID: @.***>

-- Dr. Paul Stothard - Professor Department of Agricultural, Food & Nutritional Science (AFNS) University of Alberta Edmonton, Alberta T6G 2C8 Canada

@. @. https://sites.ualberta.ca/~stothard/ office: 2-31 General Services Bldg phone: 1.780.492.5242 mobile: 1.780.297.5242

BachBioinformatics commented 2 years ago

Hello again,

Refspades_contigs.gbk is 15G

paulstothard commented 2 years ago

Hi,

Based on the sizes of the files I suspect that the total length of the contigs in the reference file will exceed the limits of the program (when I wrote it, next generation sequencing didn't exist).

The program is designed for a reference genome with a total length of less than 10 megabases. It is typically used to compare an assembled bacterial genome (the "reference") to other bacterial genomes (the "comparison" genomes).

Based on the file sizes this sounds like it this sample may consist of multiple bacterial species and other DNA sources.

If so I would use a metagenomics assembly pipeline like this that can recover individual genomes (maybe this is similar to what you are doing):

https://nf-co.re/mag

Paul

On Mon, Oct 17, 2022 at 12:59 PM @@. @.> wrote:

Hello again,

Refspades_contigs.gbk is 15G

— Reply to this email directly, view it on GitHub https://github.com/paulstothard/cgview_comparison_tool/issues/10#issuecomment-1281333933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL64CM5UF6HUJKD4JUTCJPLWDWOZBANCNFSM6AAAAAARGVM2SY . You are receiving this because you commented.Message ID: @.***>

-- Dr. Paul Stothard - Professor Department of Agricultural, Food & Nutritional Science (AFNS) University of Alberta Edmonton, Alberta T6G 2C8 Canada

@. @. https://sites.ualberta.ca/~stothard/ office: 2-31 General Services Bldg phone: 1.780.492.5242 mobile: 1.780.297.5242

BachBioinformatics commented 2 years ago

Thank you, makes sense, it is better to do it with specific MAG around 10MB