Closed imolloy closed 8 years ago
The data pushed into Kafka was this: https://github.com/cadets/trace-data/blob/master/buildInject/buildinject.json.CDM.bin
The properties
map in a CADETS record for a SUBJECT_PROCESS
looks like this:
u'properties': {u'exec': u'sshd'}
According to the CDM13 AVDL, the properties map contains arbitrary keys and values.
So it is a bad idea to rely on any keys in this map right now, unless it is used carefully as optional metadata.
Parsing with the new TRACE data is buggy because there are UUID's that appear without first appearing as subjects in the stream.
TODO before closing:
Angular similarity issue fixed: https://github.com/sbustreamspot/sbustreamspot-core/issues/35
So it looks like this is the result of bad data that should be raised on the BBN system. Some errors on Faros datasets:
Faros Skype
:
File "translate_cdm_to_streamspot.py", line 168, in <module>
pname = read_field(cdm_record_values['properties']['name'],
TypeError: 'NoneType' object has no attribute '__getitem__'
Faros.avro
This is their small dataset
File "translate_cdm_to_streamspot.py", line 353, in <module>
pid = uuid_to_pid[to_uuid]
KeyError: '93e0bcd078fcf5fb0e62fbe66d67e579'
I haven't loaded TRACE yet. The first one could be an error assuming properties
is always a map
, while CMD13 allows it to be a map
or None
. The second is more likely a bug similar to the TRACE one you mentioned. CADETS mapping to maps and not strings are also not valid CDM13. For the 14-month, as long as you know there's one data provider that is compliant enough for your code, we should be fine.
Thanks for stress-testing this! I have opened up issues with BBN here.
Also, about Faros.avro
: I think your container has cached the old code, I have commented out reading the pname
now. Maybe add a temporary ARG TEMP
just before the git clone
for sbustreamspot-cdm in the Dockerfile to invalidate the cache and re-clone the code.
I thought the same thing and built with --force-rm
. Looks like that doesn't do what I think it does. :)
Where did you find the FAROS data? I am not able to connect to anything on cs.unm.edu.
Ted had pulled down a copy and still had it on his laptop. Apparently cs.unm.edu is undergoing maintenance. Not sure when it will be back.
Traceback (most recent call last): File "translate_cdm_to_streamspot.py", line 168, in
pname = read_field(cdm_record_values['properties']['name'],
KeyError: 'name'