kelvinn / capparselib

A simple library to standardise Common Alerting Protocol (CAP) messages
Other
25 stars 10 forks source link

Parse fails on Canadian National Alerting Service Feed #7

Closed dBitech closed 5 years ago

dBitech commented 5 years ago

The CAPCP XML format that the Canadain National Alerting service uses is not parseable by this library. We get occational errors such as the following:

File "/usr/local/lib/python3.6/site-packages/capparselib/parsers.py", line 200, in get_objectified_xml
    a = objectify.fromstring(self.xml, parser)
  File "src/lxml/objectify.pyx", line 1803, in lxml.objectify.fromstring
  File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 0
lxml.etree.XMLSyntaxError: Element '{urn:oasis:names:tc:emergency:cap:1.2}references': [facet 'pattern'] The value '' is not accepted by the pattern '\s*[^\s,&<]+,[^\s,&<]+,\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d[-,+]\d\d:\d\d(\s+[^\s,&<]+,[^\s,&<]+,\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d[-,+]\d\d:\d\d)*\s*'.
kelvinn commented 5 years ago

Thank you for making this issue public @dBitech ! Can you upload an example cap alert, or point me in the direction of the one you are trying to consume?

dBitech commented 5 years ago

here is a hacky script that will read the CAP-CP from the official server, it will exhibit the fault after some time of parsing. Typically on a weather alert, so it may take an hour or two to have the fault appear.

;

from capparselib.parsers import CAPParser
import pprint
import socket
import sys      #for exit
import struct
import time

def iparse(packet):
  if (len(packet) >= 1):
    alert_list = CAPParser(str(packet, 'utf-8')).as_dict()
    print(alert_list[0]['cap_sent'],":",alert_list[0]['cap_sender'])
    if ( alert_list[0]['cap_sender'] != "NAADS-Heartbeat"):
      pp.pprint(alert_list)

def recv_timeout(the_socket,timeout=2):
    #make socket non blocking
    the_socket.setblocking(0)

    #total data partwise in an array
    total_data=[];
    data='';

    #beginning time
    begin=time.time()
    while 1:
        #if you got some data, then break after timeout
        if total_data and time.time()-begin > timeout:
            break

        #if you got no data at all, wait a little longer, twice the timeout
        elif time.time()-begin > timeout*2:
            break

        #recv something
        try:
            data = the_socket.recv(8192)
            if data:
                total_data.append(data);
                #change the beginning time for measurement
                begin = time.time()
            else:
                #sleep for sometime to indicate a gap
                time.sleep(0.1)
        except:
            pass

    #join all parts to make final string
    return b''.join(total_data)

pp = pprint.PrettyPrinter(indent=4)
try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error:
        print ("Failed to create socket")
        sys.exit()

host = "streaming1.naad-adna.pelmorex.com"
port = 8080

try:
        remote_ip = socket.gethostbyname( host )
except socket.gaierror:
        #could not resolve
        print ("Hostname could not be resolved. Exiting")
        sys.exit()

#Connect to remote server
s.connect((remote_ip , port))

while 1:
    iparse(recv_timeout(s))
dBitech commented 5 years ago

And here is a specific alert that generates this error. 20190712-110013.txt

kelvinn commented 5 years ago

Thank you for the attachment!

The problem with this feed from NAADS is that the references element needs a value if it is sent.

The 1.2 spec seems to allow omitting that element, so I'm stripping it out before it gets processed. Pretty ugly, but it seems to work.

New package is on PyPi, so just do a pip install capparselib==0.6.2 and give it a go.