[PRE REVIEW]: Unsupervised learning approach towards anomaly detection in compat logs with ADE

whedon commented 3 years ago

Submitting author: @ayush-1506 (Ayush Shridhar) Repository: https://github.com/openmainframeproject/ade.git Version: v1.0.5 Editor: @gkthiruvathukal Reviewers: @arcuri82, @mdpiper Managing EiC: Kristen Thyng

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @ayush-1506. Currently, there isn't an JOSS editor assigned to your paper.

The author's suggestion for the handling editor is @bmcfee.

@ayush-1506 if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

PDF failed to compile for issue #2972 with the following error:

Can't find any papers to compile :-(

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=5.22 s (90.8 files/s, 14113.6 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            437          10599          22656          36800
Bourne Shell                      9            215            293            719
XSLT                              3            154             54            541
XSD                               4             66             59            406
XML                               5             18             21            310
Maven                             3              6             16            276
CSS                               1             16              0             85
Bourne Again Shell                8             44            161             81
Markdown                          1             17              0             47
HTML                              2              5             14             21
JSON                              1              0              0             17
--------------------------------------------------------------------------------
SUM:                            474          11140          23274          39303
--------------------------------------------------------------------------------

Statistical information for the repository 'aef4f59eaae7aa27d77f8f93' was
gathered on 2021/01/19.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Chris Brooker                    1         68690              0           94.84
Faisal Hameed                   13           190            260            0.62
Jim Caffrey                     26          1961            674            3.64
Neale Ferguson                   3           271             41            0.43
ayman abdelghany                 3           157            135            0.40
davidoh                          2            25             22            0.06

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Faisal Hameed               161           84.7         14.7                0.00
James Caffrey              1562          100.0         10.7               23.62
Jim Caffrey                  11            0.6         14.8                0.00
Neale Ferguson              131           48.3         14.9                0.00
ayman abdelghany            130           82.8         12.8                0.00
cbrooker27                68035          100.0          0.0               36.32
davidoh                      25          100.0          0.1                0.00

kthyng commented 3 years ago

Hi @ayush-1506 — is there a paper associated with your submission?

ayush-1506 commented 3 years ago

@kthyng Yes, the code and paper live inside a different branch of the repository. Link : https://github.com/openmainframeproject/ade/tree/logs

Can we get whedon to use this branch instead of master? Else I'll discuss with my collaborators to merge this into master as soon as possible.

kthyng commented 3 years ago

@whedon generate pdf from branch logs

whedon commented 3 years ago

Attempting PDF compilation from custom branch logs. Reticulating splines etc...

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

@ayush-1506 Yes it is fine to have the paper in another branch. Please look through the paper requirements to be sure you've covered them all. For one thing, we require a section entitled "Statement of Need".

kthyng commented 3 years ago

@ayush-1506 This looks like interesting work, but can you make a compelling argument for why it is research software in particular? You can read more about that requirement here. I'm going to label this with a scope query to get the editorial board's input on this, which should take 1-2 weeks.

kthyng commented 3 years ago

@whedon scope query

whedon commented 3 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands

kthyng commented 3 years ago

@whedon query scope

whedon commented 3 years ago

Submission flagged for editorial review.

ayush-1506 commented 3 years ago

@kthyng Thanks for the input. I'll add a Statement of Need section (which will support the fact that this software and approach solves a problem). Do I need to add the argument behind this being a research software in the paper or a comment here will suffice?

kthyng commented 3 years ago

Here is the specific seciton on what your paper should contain: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain

Your statement of need should describe the research purpose of the software, but summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.

ayush-1506 commented 3 years ago

@kthyng Just realised that the Motivation section should probably be renamed to Statement of Need.

ayush-1506 commented 3 years ago

summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.

Sure, I'll make required edits to the paper and expand the same here.

ayush-1506 commented 3 years ago

Made changes to the paper, summarizing the same here:

Objective:

The aim of the project is to solve the problem of efficiently detecting anomalous logs slices from large set of logs (This can include sparse logs such as Linux Syslogs RFC3164/RFC5424 format or very dense logs such as those generated from Spark jobs). This is a common occurrence in large system or a development cluster where system crash or unexpected behavior can have adverse effects. We introduce a novel approach towards solving this problem with a data science/statistical approach. Expanding on the approach later in this comment.

Need:

Why do we need to find anomalous log slices?

Debugging system failures is a cumbersome task. Upheaval behavior in the system can be identified by studying the logs generated while the system was running. If the system fails or reacts with unexpected behavior, this data is logged somewhere. However, going through hours of dense logs is a challenge: sysadmins typically need to race against time to study large amounts of log messages to decipher the root cause of the issue. Such system failures are very common and at times unavoidable. Over the years these have led to huge loss of time and resources.

Relevant work and our approach:

While there has been work towards this direction of anomaly detection in large logs, such as TadGAN (https://arxiv.org/abs/2009.07769) and semi-supervised adversarial learning with GANs (https://doi.org/10.1109/ciss.2019.8693024), most of these approaches have focused on using large deep learning models and some treat this as a supervised problem. These models are large to train and also comparatively slower.

On the other hand, we treat the problem as a statistical one and use unsupervised learning techniques for fast and robust detection of anomalous slices. Being an unsupervised approach, we don't need labelled features. Avoiding computationally heavy deep learning makes our system fast and it's written in the Java language which makes it ideal for enterprise IT use cases (which can be adapted to others too). To this end, we divide the problem into 3 main sub-categories:

Unsupervised learning algorithms: At the heart of ADE are unsupervised learning algorithms that are trained to understand the actual expected behavior of the system and compare it with the observed behavior during inference.
Model groups: We divide the training into several categories, called model groups. Through model groups, multiple systems contribute to the generation of a single model for the group; the more systems in the group, the more data our system can use to build the model.
Statistical scores: To come up with an anomaly score, we calculate a number of statistical score that contribute to the final anomaly score. These scores include Bernoulli Score, Poisson Score, LogNormal Score, Best-of-two score, rarity score, severity score, Clustering score, Percentile score, FullBernoulliClusterAware score to name a few.

Along with this, for each message, we try to classify it into four categories based on the frequency of the particular family of messages. These classes include:

New : Defines a completely new message (previously unseen)
IN_SYNC : Implies that ADE expects the message to be issued in a periodic pattern and the message was issued as expected
NOT_IN_SYNC : Implies that ADE expects the message to be issued in a periodic pattern but the message was not expected
NOT_PERIODIC : Indicates that ADE does not expect the message to be periodic

Using all this calculated information, we allocate an anomaly score to every internal slice. The higher the anomaly score, the greater are the chances of that particular slice being the source of anomalous logs.

Output format:

Output format: Finally, we write out the analysis output in XML format. An example of the analysis output for a day can be seen here. We also provide specialized output for each interval, which can be accessed by clicking on the XML links associated with each slice. Examples of analysis for a period can be viewed here. Our approach has shown comparatively accurate results when tested on real data, along with fast inference and training. We also provide sample data and instructions to build the binary and run it on the data.

Looking at the What we mean by research software section, I believe this falls under the category: software that: solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering) and extracts knowledge from large data sets.

Kindly let me know if there are any questions or if I missed something.

kthyng commented 3 years ago

@ayush-1506 Thank you! someone from the editorial board will get back to you after a week or two.

danielskatz commented 3 years ago

@ayush-1506 - can you explain what code you are submitting to JOSS in this branch vs the overall repo? The paper seems to describe ADE, which is what the repo contains, but you also have suggested that https://github.com/openmainframeproject/ade/tree/logs is the contribution being submitted here, and I can't tell if the paper describes that specific contribution.

ayush-1506 commented 3 years ago

@danielskatz There were some issues (here : CLA wasn't registering the contributors) with pushing new commits to master branch in the ADE repository, hence all development was being done and reviewed in the logs branch temporarily. However, the issues with CLA have been resolved now and all changes have been merged to master. We can take the master branch as the main branch with paper and code from now on.

danielskatz commented 3 years ago

So is all of the content in the main branch the JOSS submission?

danielskatz commented 3 years ago

@whedon check repository

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=2.31 s (210.5 files/s, 32943.2 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            444          10823          23275          37558
Bourne Shell                     10            271            359            895
XSD                               5             88             76            542
XSLT                              3            154             54            541
XML                               6             23             45            468
Maven                             3              6             16            276
Markdown                          2             56              0            191
CSS                               1             16              0             85
Bourne Again Shell                8             44            161             81
TeX                               1              2              0             27
HTML                              2              5             14             21
JSON                              1              0              0             17
YAML                              1              1              0              8
--------------------------------------------------------------------------------
SUM:                            487          11489          24000          40710
--------------------------------------------------------------------------------

Statistical information for the repository '23ad3deb8570f20df8428079' was
gathered on 2021/01/26.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Chris Brooker                    1         68690              0           92.61
Faisal Hameed                   13           190            260            0.61
Jim Caffrey                     26          1961            674            3.55
Neale Ferguson                   3           271             41            0.42
ayman abdelghany                 3           157            135            0.39
ayush-1506                      13          1673             72            2.35
davidoh                          2            25             22            0.06

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Faisal Hameed               161           84.7         57.9                0.00
James Caffrey              1554          100.0         53.9               23.68
Jim Caffrey                  11            0.6         58.0                0.00
Neale Ferguson              131           48.3         58.1                0.00
ayman abdelghany            130           82.8         56.1                0.00
ayush-1506                 1619           96.8          6.5               42.62
cbrooker27                68025          100.0          0.0               36.31
davidoh                      25          100.0         43.3                0.00

ayush-1506 commented 3 years ago

So is all of the content in the main branch the JOSS submission?

@danielskatz Yes: all changes were merged yesterday.

danielskatz commented 3 years ago

👋 @gkthiruvathukal - Any chance you could take on one more submission for JOSS?

danielskatz commented 3 years ago

@whedon invite @gkthiruvathukal as editor

whedon commented 3 years ago

@gkthiruvathukal has been invited to edit this submission.

gkthiruvathukal commented 3 years ago

@danielskatz Yes, I will be happy to handle this submission. Assigning self.

gkthiruvathukal commented 3 years ago

@whedon assign @gkthiruvathukal as editor

whedon commented 3 years ago

OK, the editor is @gkthiruvathukal

gkthiruvathukal commented 3 years ago

@ayush-1506 Can you please suggest some possible reviewers (without the at symbol) from the list of reviewers? See author instructions at the beginning of this issue thread.

ayush-1506 commented 3 years ago

@gkthiruvathukal Upon skimming over the list of reviewers, I'd say @ {hausen and mosteo} could be possible reviewers here.

gkthiruvathukal commented 3 years ago

@whedon commands

whedon commented 3 years ago

Here are some things you can ask me to do:

# List all of Whedon's capabilities
@whedon commands

# Assign a GitHub user as the sole reviewer of this submission
@whedon assign @username as reviewer

# Add a GitHub user to the reviewers of this submission
@whedon add @username as reviewer

# Re-invite a reviewer (if they can't update checklists)
@whedon re-invite @username as reviewer

# Remove a GitHub user from the reviewers of this submission
@whedon remove @username as reviewer

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

# Change editorial assignment
@whedon assign @username as editor

# Set the software archive DOI at the top of the issue e.g.
@whedon set 10.0000/zenodo.00000 as archive

# Set the software version at the top of the issue e.g.
@whedon set v1.0.1 as version

# Open the review issue
@whedon start review

EDITORIAL TASKS

# All commands can be run on a non-default branch, to do this pass a custom 
# branch name by following the command with `from branch custom-branch-name`.
# For example:

# Compile the paper
@whedon generate pdf

# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name

# Remind an author or reviewer to return to a review after a
# certain period of time (supported units days and weeks)
@whedon remind @reviewer in 2 weeks

# Ask Whedon to do a dry run of accepting the paper and depositing with Crossref
@whedon accept

# Ask Whedon to check the references for missing DOIs
@whedon check references

# Ask Whedon to check repository statistics for the submitted software
@whedon check repository

EiC TASKS

# Invite an editor to edit a submission (sending them an email)
@whedon invite @editor as editor

# Reject a paper
@whedon reject

# Withdraw a paper
@whedon withdraw

# Ask Whedon to actually accept the paper and deposit with Crossref
@whedon accept deposit=true

gkthiruvathukal commented 3 years ago

@hausen and @mosteo, are you willing to contribute a review for this JOSS submission?

ayush-1506 commented 3 years ago

@gkthiruvathukal Should I propose as few more reviewers (in case mosteo and hausen aren't available) ?

hausen commented 3 years ago

@gkthiruvathukal sorry, I won't be able to review this submission.

mosteo commented 3 years ago

@gkthiruvathukal, I'm unfortunately overstretched with reviews, besides not being much involved with unsupervised learning.

gkthiruvathukal commented 3 years ago

@ayush-1506 Can you suggest a few more possibilities? Both @mosteo and @hausen are not able to review this submission.

ayush-1506 commented 3 years ago

@gkthiruvathukal Suggesting a few more possible reviewers : @{xirdneh, arcuri8, nnadeau}

gkthiruvathukal commented 3 years ago

@ayush-1506 I have leaned on all 3 of those somewhat recently. However, I know Andrea Acuri often is willing. If you can suggest a couple more names, that would be helpful.

@arcuri82, are you willing to contribute a review for this JOSS submission?

ayush-1506 commented 3 years ago

@gkthiruvathukal Sure, adding some more: {jonathanschilling, kuangmeng, marcoapintoo}. Let me know if you'd like more suggestions.

arcuri82 commented 3 years ago

@gkthiruvathukal Hi, yes I can be a reviewer for this submission

gkthiruvathukal commented 3 years ago

@arcuri82 Thanks for helping. me again! I will add you know and work on getting a second reviewer.

gkthiruvathukal commented 3 years ago

@whedon assign @arcuri82 as reviewer

whedon commented 3 years ago

OK, @arcuri82 is now a reviewer

gkthiruvathukal commented 3 years ago

@ jonathanschilling are you willing to contribute a review for this JOSS submission?

ayush-1506 commented 3 years ago

@gkthiruvathukal I think jonathanschilling wasn't notified since there's a space between @ and jonathanschilling in your comment above(if this was not intentional)

openjournals / joss-reviews