sbustreamspot / sbustreamspot-docker

Docker image for the DARPA engagement
1 stars 1 forks source link

Bug parsing CDM13 Cadets Data #4

Closed imolloy closed 8 years ago

imolloy commented 8 years ago

Traceback (most recent call last): File "translate_cdm_to_streamspot.py", line 168, in pname = read_field(cdm_record_values['properties']['name'], KeyError: 'name'

imolloy commented 8 years ago

The data pushed into Kafka was this: https://github.com/cadets/trace-data/blob/master/buildInject/buildinject.json.CDM.bin

emaadmanzoor commented 8 years ago

The properties map in a CADETS record for a SUBJECT_PROCESS looks like this:

u'properties': {u'exec': u'sshd'}

According to the CDM13 AVDL, the properties map contains arbitrary keys and values.

So it is a bad idea to rely on any keys in this map right now, unless it is used carefully as optional metadata.

Parsing with the new TRACE data is buggy because there are UUID's that appear without first appearing as subjects in the stream.

TODO before closing:

emaadmanzoor commented 8 years ago

Angular similarity issue fixed: https://github.com/sbustreamspot/sbustreamspot-core/issues/35

imolloy commented 8 years ago

So it looks like this is the result of bad data that should be raised on the BBN system. Some errors on Faros datasets: Faros Skype:

  File "translate_cdm_to_streamspot.py", line 168, in <module>
    pname = read_field(cdm_record_values['properties']['name'],
TypeError: 'NoneType' object has no attribute '__getitem__'

Faros.avro This is their small dataset

  File "translate_cdm_to_streamspot.py", line 353, in <module>
    pid = uuid_to_pid[to_uuid]
KeyError: '93e0bcd078fcf5fb0e62fbe66d67e579'

I haven't loaded TRACE yet. The first one could be an error assuming properties is always a map, while CMD13 allows it to be a map or None. The second is more likely a bug similar to the TRACE one you mentioned. CADETS mapping to maps and not strings are also not valid CDM13. For the 14-month, as long as you know there's one data provider that is compliant enough for your code, we should be fine.

emaadmanzoor commented 8 years ago

Thanks for stress-testing this! I have opened up issues with BBN here.

Also, about Faros.avro: I think your container has cached the old code, I have commented out reading the pname now. Maybe add a temporary ARG TEMP just before the git clone for sbustreamspot-cdm in the Dockerfile to invalidate the cache and re-clone the code.

imolloy commented 8 years ago

I thought the same thing and built with --force-rm. Looks like that doesn't do what I think it does. :)

emaadmanzoor commented 8 years ago

Where did you find the FAROS data? I am not able to connect to anything on cs.unm.edu.

imolloy commented 8 years ago

Ted had pulled down a copy and still had it on his laptop. Apparently cs.unm.edu is undergoing maintenance. Not sure when it will be back.