Open akondrahman opened 3 years ago
Suggesting another selling point:
Possible Title: Tell Me What: Towards Security-focused Logging for Machine Learning Development Possible RQs:
RQ1: What security-related events can be logged for machine learning development?
RQ2: How frequently do security-related events appear in machine learning development? How frequently are security-related events logged in machine learning implementations?
RQ3: How do practitioners perceive the identified security-related events for machine learning?
Possible RQs
RQ1: What categories of security-relevant code snippets can be logged for machine learning development?
RQ2: How frequently do security-relevant code snippets appear in machine learning development? How frequently are security-relevant code snippets logged in machine learning development?
RQ3: How do practitioners perceive the identified security-relevant code snippets for machine learning development?
The problem with security-relevant code snippets is that it can also include insecure coding snippets, which we are not detecting
King has identified mandatory log events ... can we build on top of King to find security log events.
King says A mandatory log event is an action that must be logged in order to hold the software user accountable for performing the action
We will say A security log event is an action expressed by source code elements that should be logged to perform post mortem analysis of security attacks in machine learning
We will identify security log events for ML using:
Another option is to say adversarial log event
instead of security log event
If we want to tone it down we can say Likely adversarial log events
or candidate security log event
instead of security log event
Useful definitions from Chuvakin's book:
An event is a single occurrence within an environment, usually involving an attempted state change
An event field describes one characteristic of an event An event record is a collection of event fields A log is a collection of event records Logging is the act of collecting event records into logs Alert or alarm is an action taken in response to an event, usually intended to get the attention of someone or sth.
Page#235 of Chuvakin's book to motivate the paper better
May be it will not be wise to submit bug reports ... it is possible that a lot of people will say no. Better to do a survey.
Use page#2 as motivation from Security Engineering for Machine Learning
In the discussion section need to say why automated log assistant was not done and can be done in future ... groundwork, perceptions etc.
Forensic events: A forensic event in machine learning is an action expressed by source code elements that should be logged to perform post mortem analysis of security attacks in machine learning
Forensic-likely coding patterns
can be one term that we can use. This will require submitting bug report that will not give us good response rate. Can frame it as categories of forensic-likely coding patterns
and see if devs agree with that.
Example forensic-likely coding patterns
are load
, read
methods used to read datasets for training.
Definition: forensic-likely coding patterns are recurring coding patterns that express a mandatory log event needed to perform post mortem analysis of security attacks.
Category names:
Limit scope by focusing on adversarial machine learning
, like what to log to diagnose adversarial attacks on machine learning ... need to define:
Follow the train of thought: initially it was not clear why different from King, then definition of adversarial ML, then attack in the context of adversarial ML, then example attacks, how different actions map to attacks, interesting names like reinforcement learning environment
policy attacks need policy detection ... is a set of steps and values ... see: https://stackoverflow.com/questions/46260775/what-is-a-policy-in-reinforcement-learning
@fbhuiyan42 ... hope you are following this thread. This is where you discuss and ask questions.
Are we planning to present the paper only for supervised projects? I thought we are presenting all types of projects, the category "Policy forensics in reinforcement learning" being applicable only for reinforcement learning.
@fbhuiyan42
This will depend on how clear your project classification is: we will do analysis on projects that are clearly labeled as supervised
, unsupervised
, or reinforcement
. As far as I can remember, you were confidently able to classify supervised
learning projects. Correct me if I am wrong.
I am confident about the reinforcement projects also. But in that case, yes, I agree, without the unsupervised projects, it's better not to report the RL projects also.
Yes. We need to tell a consistent story. That is why we will skip reinforcement-related findings for this project. We will save the reinforcement results for a short paper or sth. after this one has a home.
Selling Point (Option-1)
Creating this issue so that discussion on definitions does not get lost. Here is how I am defining forensic anti-patterns:
Note to self:
Counter-argument: forensic anti-patterns are hard to detect e.g. we can never conclusively say sth. is missing or not logged. If
developers do not log X
is the focus of the paper, then paper may get rejected.