hgascon / adagio

Structural Analysis and Detection of Android Malware
GNU General Public License v2.0
78 stars 32 forks source link

Classification process #23

Open harelber opened 1 year ago

harelber commented 1 year ago

Hi Hugo, Regarding Adagio, do you remember how to run the classification process on the extracted graphs? I extracted the graphs using the -f/-p flags. However, there is no documentation how to actually run the train and test of an ML on these graphs. I assume that the common directory holds the functions, but I don't find the sequence. I will be happy if you can add some instructions/script on this matter.

hgascon commented 1 year ago

Hi @harelber you can instantiate an Analysis object:

In [1]: from adagio.core.analysis import Analysis

In [2]: Analysis?
Init signature:
Analysis(
    dirs,
    labels,
    split,
    max_files=0,
    max_node_size=0,
    precomputed_matrix='',
    y='',
    fnames='',
)
Docstring:      A class to run a classification experiment
Init docstring:
The Analysis class allows to load sets of pickled graoh objects
from different directories where the objects in each directory
belong to different classes. It also provide the methods to run
different types of classification experiments by training and
testing a linear classifier on the feature vectors generated
from the different graph objects.

:dirs: A list with directories including types of files for
    classification e.g. <[MALWARE_DIR, CLEAN_DIR]> or just
    directories with samples from different malware families
:labels: The labels assigned to samples in each directory.
    For example a number or a string.
:split: The percentage of samples used for training (value
    between 0 and 1)
:precomputed_matrix: name of file if a data or kernel matrix
    has already been computed.
:y: If precomputed_matrix is True, a pickled and gzipped list
    of labels must be provided.
:returns: an Analysis object with the dataset as a set of
    properties and several functions to train, test, evaluate
    or run a learning experiment iteratively.

And then run the different experiments in the Analysis class:

In [3]: a = Analysis(...)
[...]
In [4]: a.run_linear_experiment(...)

There are other helping functions in the Analysis class that you can experiment with. Let me know if that works.