sbustreamspot / sbustreamspot-train

Convert training graphs to shingle vectors and compute the best clustering
Apache License 2.0
1 stars 4 forks source link

StreamSpot Train

http://www3.cs.stonybrook.edu/~emanzoor/streamspot/

Requirements

Training Procedure

The following steps assume this repository has been cloned and all dependencies installed.

Convert the training data from CDM13/Avro to StreamSpot

For detailed instructions, see the sbustreamspot-cdm README.

For the purpose of instruction, infoleak_small_units.CDM13.avro is assumed to be the training data.

Convert the StreamSpot training graphs to shingle vectors

The graph-to-shingle-vector transformation code is in C++ to ensure high performance. It is a modified version of the streamspot-core code.

Build and run the code as follows;

cd graphs-to-shingle-vectors
make optimized
./streamspot --edges=../streamspot/infoleak_small_units.CDM13.ss --chunk-length 24 > ../shingles/infoleak_small_units.CDM13.sv
cd ..

Cluster the training graph shingle vectors

Ensure the dependencies have been installed: pip install -r requirements.txt

python create_seed_clusters.py  --input shingles/infoleak_small_units.CDM13.sv > clusters/infoleak_small_units.CDM13.cl

The *.cl file can then be provided to streamspot-core.

Contact