Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
PDF failed to compile for issue #2972 with the following error:
Can't find any papers to compile :-(
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=5.22 s (90.8 files/s, 14113.6 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Java 437 10599 22656 36800
Bourne Shell 9 215 293 719
XSLT 3 154 54 541
XSD 4 66 59 406
XML 5 18 21 310
Maven 3 6 16 276
CSS 1 16 0 85
Bourne Again Shell 8 44 161 81
Markdown 1 17 0 47
HTML 2 5 14 21
JSON 1 0 0 17
--------------------------------------------------------------------------------
SUM: 474 11140 23274 39303
--------------------------------------------------------------------------------
Statistical information for the repository 'aef4f59eaae7aa27d77f8f93' was
gathered on 2021/01/19.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Chris Brooker 1 68690 0 94.84
Faisal Hameed 13 190 260 0.62
Jim Caffrey 26 1961 674 3.64
Neale Ferguson 3 271 41 0.43
ayman abdelghany 3 157 135 0.40
davidoh 2 25 22 0.06
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Faisal Hameed 161 84.7 14.7 0.00
James Caffrey 1562 100.0 10.7 23.62
Jim Caffrey 11 0.6 14.8 0.00
Neale Ferguson 131 48.3 14.9 0.00
ayman abdelghany 130 82.8 12.8 0.00
cbrooker27 68035 100.0 0.0 36.32
davidoh 25 100.0 0.1 0.00
Hi @ayush-1506 — is there a paper associated with your submission?
@kthyng Yes, the code and paper live inside a different branch of the repository. Link : https://github.com/openmainframeproject/ade/tree/logs
Can we get whedon to use this branch instead of master? Else I'll discuss with my collaborators to merge this into master as soon as possible.
@whedon generate pdf from branch logs
Attempting PDF compilation from custom branch logs. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@ayush-1506 Yes it is fine to have the paper in another branch. Please look through the paper requirements to be sure you've covered them all. For one thing, we require a section entitled "Statement of Need".
@ayush-1506 This looks like interesting work, but can you make a compelling argument for why it is research software in particular? You can read more about that requirement here. I'm going to label this with a scope query to get the editorial board's input on this, which should take 1-2 weeks.
@whedon scope query
I'm sorry human, I don't understand that. You can see what commands I support by typing:
@whedon commands
@whedon query scope
Submission flagged for editorial review.
@kthyng Thanks for the input. I'll add a Statement of Need section (which will support the fact that this software and approach solves a problem). Do I need to add the argument behind this being a research software in the paper or a comment here will suffice?
Here is the specific seciton on what your paper should contain: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain
Your statement of need should describe the research purpose of the software, but summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.
@kthyng Just realised that the Motivation section should probably be renamed to Statement of Need.
summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it.
Sure, I'll make required edits to the paper and expand the same here.
Made changes to the paper, summarizing the same here:
The aim of the project is to solve the problem of efficiently detecting anomalous logs slices from large set of logs (This can include sparse logs such as Linux Syslogs RFC3164/RFC5424 format or very dense logs such as those generated from Spark jobs). This is a common occurrence in large system or a development cluster where system crash or unexpected behavior can have adverse effects. We introduce a novel approach towards solving this problem with a data science/statistical approach. Expanding on the approach later in this comment.
Why do we need to find anomalous log slices?
Debugging system failures is a cumbersome task. Upheaval behavior in the system can be identified by studying the logs generated while the system was running. If the system fails or reacts with unexpected behavior, this data is logged somewhere. However, going through hours of dense logs is a challenge: sysadmins typically need to race against time to study large amounts of log messages to decipher the root cause of the issue. Such system failures are very common and at times unavoidable. Over the years these have led to huge loss of time and resources.
While there has been work towards this direction of anomaly detection in large logs, such as TadGAN (https://arxiv.org/abs/2009.07769) and semi-supervised adversarial learning with GANs (https://doi.org/10.1109/ciss.2019.8693024), most of these approaches have focused on using large deep learning models and some treat this as a supervised problem. These models are large to train and also comparatively slower.
On the other hand, we treat the problem as a statistical one and use unsupervised learning techniques for fast and robust detection of anomalous slices. Being an unsupervised approach, we don't need labelled features. Avoiding computationally heavy deep learning makes our system fast and it's written in the Java language which makes it ideal for enterprise IT use cases (which can be adapted to others too). To this end, we divide the problem into 3 main sub-categories:
Unsupervised learning algorithms: At the heart of ADE are unsupervised learning algorithms that are trained to understand the actual expected behavior of the system and compare it with the observed behavior during inference.
Model groups: We divide the training into several categories, called model groups. Through model groups, multiple systems contribute to the generation of a single model for the group; the more systems in the group, the more data our system can use to build the model.
Statistical scores: To come up with an anomaly score, we calculate a number of statistical score that contribute to the final anomaly score. These scores include Bernoulli Score, Poisson Score, LogNormal Score, Best-of-two score, rarity score, severity score, Clustering score, Percentile score, FullBernoulliClusterAware score to name a few.
Along with this, for each message, we try to classify it into four categories based on the frequency of the particular family of messages. These classes include:
Using all this calculated information, we allocate an anomaly score to every internal slice. The higher the anomaly score, the greater are the chances of that particular slice being the source of anomalous logs.
Output format: Finally, we write out the analysis output in XML format. An example of the analysis output for a day can be seen here. We also provide specialized output for each interval, which can be accessed by clicking on the XML links associated with each slice. Examples of analysis for a period can be viewed here. Our approach has shown comparatively accurate results when tested on real data, along with fast inference and training. We also provide sample data and instructions to build the binary and run it on the data.
Looking at the What we mean by research software section, I believe this falls under the category: software that: solves complex modeling problems in a scientific context (physics, mathematics, biology, medicine, social science, neuroscience, engineering) and extracts knowledge from large data sets
.
Kindly let me know if there are any questions or if I missed something.
@ayush-1506 Thank you! someone from the editorial board will get back to you after a week or two.
@ayush-1506 - can you explain what code you are submitting to JOSS in this branch vs the overall repo? The paper seems to describe ADE, which is what the repo contains, but you also have suggested that https://github.com/openmainframeproject/ade/tree/logs is the contribution being submitted here, and I can't tell if the paper describes that specific contribution.
@danielskatz There were some issues (here : CLA wasn't registering the contributors) with pushing new commits to master branch in the ADE repository, hence all development was being done and reviewed in the logs branch temporarily. However, the issues with CLA have been resolved now and all changes have been merged to master. We can take the master branch as the main branch with paper and code from now on.
So is all of the content in the main branch the JOSS submission?
@whedon check repository
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=2.31 s (210.5 files/s, 32943.2 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Java 444 10823 23275 37558
Bourne Shell 10 271 359 895
XSD 5 88 76 542
XSLT 3 154 54 541
XML 6 23 45 468
Maven 3 6 16 276
Markdown 2 56 0 191
CSS 1 16 0 85
Bourne Again Shell 8 44 161 81
TeX 1 2 0 27
HTML 2 5 14 21
JSON 1 0 0 17
YAML 1 1 0 8
--------------------------------------------------------------------------------
SUM: 487 11489 24000 40710
--------------------------------------------------------------------------------
Statistical information for the repository '23ad3deb8570f20df8428079' was
gathered on 2021/01/26.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Chris Brooker 1 68690 0 92.61
Faisal Hameed 13 190 260 0.61
Jim Caffrey 26 1961 674 3.55
Neale Ferguson 3 271 41 0.42
ayman abdelghany 3 157 135 0.39
ayush-1506 13 1673 72 2.35
davidoh 2 25 22 0.06
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Faisal Hameed 161 84.7 57.9 0.00
James Caffrey 1554 100.0 53.9 23.68
Jim Caffrey 11 0.6 58.0 0.00
Neale Ferguson 131 48.3 58.1 0.00
ayman abdelghany 130 82.8 56.1 0.00
ayush-1506 1619 96.8 6.5 42.62
cbrooker27 68025 100.0 0.0 36.31
davidoh 25 100.0 43.3 0.00
So is all of the content in the main branch the JOSS submission?
@danielskatz Yes: all changes were merged yesterday.
👋 @gkthiruvathukal - Any chance you could take on one more submission for JOSS?
@whedon invite @gkthiruvathukal as editor
@gkthiruvathukal has been invited to edit this submission.
@danielskatz Yes, I will be happy to handle this submission. Assigning self.
@whedon assign @gkthiruvathukal as editor
OK, the editor is @gkthiruvathukal
@ayush-1506 Can you please suggest some possible reviewers (without the at symbol) from the list of reviewers? See author instructions at the beginning of this issue thread.
@gkthiruvathukal Upon skimming over the list of reviewers, I'd say @ {hausen and mosteo} could be possible reviewers here.
@whedon commands
Here are some things you can ask me to do:
# List all of Whedon's capabilities
@whedon commands
# Assign a GitHub user as the sole reviewer of this submission
@whedon assign @username as reviewer
# Add a GitHub user to the reviewers of this submission
@whedon add @username as reviewer
# Re-invite a reviewer (if they can't update checklists)
@whedon re-invite @username as reviewer
# Remove a GitHub user from the reviewers of this submission
@whedon remove @username as reviewer
# List of editor GitHub usernames
@whedon list editors
# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers
# Change editorial assignment
@whedon assign @username as editor
# Set the software archive DOI at the top of the issue e.g.
@whedon set 10.0000/zenodo.00000 as archive
# Set the software version at the top of the issue e.g.
@whedon set v1.0.1 as version
# Open the review issue
@whedon start review
EDITORIAL TASKS
# All commands can be run on a non-default branch, to do this pass a custom
# branch name by following the command with `from branch custom-branch-name`.
# For example:
# Compile the paper
@whedon generate pdf
# Compile the paper from alternative branch
@whedon generate pdf from branch custom-branch-name
# Remind an author or reviewer to return to a review after a
# certain period of time (supported units days and weeks)
@whedon remind @reviewer in 2 weeks
# Ask Whedon to do a dry run of accepting the paper and depositing with Crossref
@whedon accept
# Ask Whedon to check the references for missing DOIs
@whedon check references
# Ask Whedon to check repository statistics for the submitted software
@whedon check repository
EiC TASKS
# Invite an editor to edit a submission (sending them an email)
@whedon invite @editor as editor
# Reject a paper
@whedon reject
# Withdraw a paper
@whedon withdraw
# Ask Whedon to actually accept the paper and deposit with Crossref
@whedon accept deposit=true
@hausen and @mosteo, are you willing to contribute a review for this JOSS submission?
@gkthiruvathukal Should I propose as few more reviewers (in case mosteo and hausen aren't available) ?
@gkthiruvathukal sorry, I won't be able to review this submission.
@gkthiruvathukal, I'm unfortunately overstretched with reviews, besides not being much involved with unsupervised learning.
@ayush-1506 Can you suggest a few more possibilities? Both @mosteo and @hausen are not able to review this submission.
@gkthiruvathukal Suggesting a few more possible reviewers : @{xirdneh, arcuri8, nnadeau}
@ayush-1506 I have leaned on all 3 of those somewhat recently. However, I know Andrea Acuri often is willing. If you can suggest a couple more names, that would be helpful.
@arcuri82, are you willing to contribute a review for this JOSS submission?
@gkthiruvathukal Sure, adding some more: {jonathanschilling, kuangmeng, marcoapintoo}. Let me know if you'd like more suggestions.
@gkthiruvathukal Hi, yes I can be a reviewer for this submission
@arcuri82 Thanks for helping. me again! I will add you know and work on getting a second reviewer.
@whedon assign @arcuri82 as reviewer
OK, @arcuri82 is now a reviewer
@ jonathanschilling are you willing to contribute a review for this JOSS submission?
@gkthiruvathukal I think jonathanschilling wasn't notified since there's a space between @
and jonathanschilling
in your comment above(if this was not intentional)
Submitting author: @ayush-1506 (Ayush Shridhar) Repository: https://github.com/openmainframeproject/ade.git Version: v1.0.5 Editor: @gkthiruvathukal Reviewers: @arcuri82, @mdpiper Managing EiC: Kristen Thyng
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Author instructions
Thanks for submitting your paper to JOSS @ayush-1506. Currently, there isn't an JOSS editor assigned to your paper.
The author's suggestion for the handling editor is @bmcfee.
@ayush-1506 if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).
Editor instructions
The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type: