rpytel1 / log-strategy

Project conducted for Seminar in Machine Learning for Software Engineering. Aim of our research was to explore possible directions of Deep Learning solutions for log detection in a snippet of code.
0 stars 1 forks source link

Training components #5

Closed rpytel1 closed 5 years ago

rpytel1 commented 5 years ago

Training

Validation

Open questions

  1. What is a realistic distribution logs per "code"?
  2. Training sample size?
  3. How does proper validation work?
  4. How big is the input size for the NN (function size in data sets)?

Components to be implemented

rpytel1 commented 5 years ago

Abstract Syntax Tree Parser and parts of a preprocessing pipeline: https://github.com/jan-gerling/mmsr_repo_sim

rpytel1 commented 5 years ago

Notes for 20.09:

  1. Check papers from interesting papers and give some summary on papers we read and how does it relate to our case
  2. Parser: preprocess using AST
  3. Parser: how to extract logging lines and later create features and labels
rpytel1 commented 5 years ago

Our Ideas

Idea 1: Transfer Learning for code2vec Idea 2: As a comparison SVM (perhaps other traditional ML tasks) Idea 3: Reduce the feature space( data ablation study)

rpytel1 commented 5 years ago

18.09 Notes:

Preporcessing TODOs: