mysociety / sayit

SayIt - a component for recording and storing public statements.
http://sayit.mysociety.org/
Other
105 stars 38 forks source link

Import from Akoma Ntoso error #517

Closed billy3321 closed 8 years ago

billy3321 commented 8 years ago

I try to import my meeting record to my sayit site, but it seems some error happened.

My site is at here: http://jrf.sayit.mysociety.org/

and my Akoma Ntoso is at here: https://raw.githubusercontent.com/JRF-tw/nationwide_judicial_reform_meeting/master/2016-jrf/20160614.an

Does my Akoma Ntoso format error? or there has some bug?

wfdd commented 8 years ago

You've got two unescaped ampersands inside a query string on line 116. & is a reserved character in XML, which means the ampersand literal needs to be rendered as &. If you run across the same issue again in the future, try loading the offending file in a Python shell:

In [1]: import lxml.etree, urllib.request

In [2]: with urllib.request.urlopen('https://raw.githubusercontent.com/JRF-tw/nationwide_judicial_reform_meeting/master/2016-jrf/20160614.an') as file:
   ...:     lxml.etree.fromstring(file.read())
   ...:     
  File "<string>", line unknown
XMLSyntaxError: EntityRef: expecting ';', line 116, column 158
audreyt commented 8 years ago

@wfdd thank you for the prompt reply! however the error persists after changing the ampersand to a hexadecimal entity &#x26; — see this link for the modified .an.xml

>>> import lxml.etree, urllib.request
>>>
>>> with urllib.request.urlopen('https://archive.tw/2016-06-14-%E5%85%A8%E6%B0%91%E5%8F%B8%E6%B3%95%E6%94%B9%E9%9D%A9%E9%81%8B%E5%8B%95-%E7%AC%AC%E4%BA%8C%E9%9A%8E%E6%AE%B5%E7%AC%AC%E4%B8%80%E6%AC%A1%E5%B7%A5%E4%BD%9C%E6%9C%83%E8%AD%B0%E6%9C%83%E8%AD%B0%E7%B4%80%E9%8C%84.an.xml') as file:
...   lxml.etree.fromstring(file.read())
...
<Element akomaNtoso at 0x1038a21c8>
audreyt commented 8 years ago

The tested mysociety instance name is another-test with the URL above:

screen shot 2016-06-28 at 5 40 34 pm

I tried importing the same .an.xml into a local docker instance running an older version. It initially said:

An exception of type IntegrityError occurred, arguments:
duplicate key value violates unique constraint "speeches_speaker_instance_id_425c08559ac70635_uniq"
DETAIL:  Key (instance_id, slug)=(1, 高榮志) already exists.

After manually adjusting the ontology showAs it worked locally, but not on the public instance.

wfdd commented 8 years ago

Is that with a different file? In https://archive.tw/2016-06-14-全民司法改革運動-第二階段第一次工作會議會議紀錄.an.xml it's the href that's changed.

dracos commented 8 years ago

That URL imports okay into my local default sayit-package instance, which is the version running on sayit.mysociety.org: screen shot 2016-06-28 at 12 40 51 So I'm just looking as to why it's giving an error on import on sayit.mysociety.org, it should be fine...

dracos commented 8 years ago

Sorry, sayit.mysociety.org had somehow had an incompatible version of the elasticsearch library installed at some point recently, which was causing the error upon trying to index the first speaker being imported.

I've fixed this now, and imported that file successfully into my own testing instance. It should be fine in your 'another-test' instance, I'm a bit worried the error happened at a point that your jrf instance might give a different error upon import (similar to the IntegrityError you posted above)

dracos commented 8 years ago

Okay, I've delete the rogue unfinished Speaker object that was imported before the error raised, so hopefully all should be okay now. Please reopen if not, thanks for spotting and letting us know! :)