iml-wg / HEP-ML-Resources

Listing of useful learning resources for machine learning applications in high energy physics (HEPML)
MIT License
330 stars 115 forks source link

Paper subcategories? #1

Open dguest opened 7 years ago

dguest commented 7 years ago

I'm wondering if you'd want some kind of sub-categories for the papers, since there are a few dozen papers regarding hep for ML and it may get sort of confusing. Either that or you can vet the list for the "important" ones, but that's going to be a bit more subjective.

I'm not sure about the ideal categorization though: my personal bias would be to have a category for each experiment (with ATLAS and CMS lumped together, they do the same thing), and maybe then have sub-categories that split between the more "theoretical" papers (using Delphes or just Pythia) and the things from real experiments (there are a few results from ATLAS, CMS, and some neutrino experiments).

matthewfeickert commented 7 years ago

@dguest I think this is an excellent idea, especially as the paper list is growing fast. If it gets too large we might even need to split it up into different .md files.

What do you think about splitting the papers into subsection by ML topic (similar to how the talks at DS@HEP 2017 were split by topic)? So to name a few that would be Computer Vision/Jet Images, Anomaly/Outlier Detection, Adversarial Networks, ...

dguest commented 7 years ago

Yeah I'm trying to figure out if it makes more sense to split by the physics signature or by the algorithm. I think I agree that grouping by the type of algorithm is probably most useful.

matthewfeickert commented 7 years ago

Okay. Can we then have a discussion on what should be the algorithm subsections we should use? @dguest @mickypaganini @makagan @SergeiML @Marie89 if you have insight here on where it makes sense to draw meaningful distinctions that would be helpful. For example, I don't know where to meaningfully distinguish between neural networks in general and deep learning.

Some preliminary paper subcategories suggestions:

bstienen commented 7 years ago

I am not sure if i agree with the paper categorization based on algorithm type, it depends on what the goal is of the list. If we want to provide an overview of how specific ML algorithms are used then categorization by algorithm is most useful, but this draws attention away from the physics. If we however want to approach it with "let's see how machine learning can be used in HEP" it feels more natural to start from the physics topics and categorize by those topics. I personally prefer the last approach, which would yield a list like the following (not exhaustive, quickly made based on the papers currently present in the repo)

If however we end up deciding to group by algorithm, may i then suggest to replace Boosted Decision Trees by Ensemble Methods? That way, also algorithms like Random Forest and the somewhat more general AdaBoost algorithms can be categorized. This would yield then:

matthewfeickert commented 7 years ago

@bstienen I see what you're saying. I was originally more thinking of "How are the types of machine learning applied to HEP?" but I think for our community the idea of "In what areas of HEP is machine learning applied?" is maybe the better way to phrase things, so I like your proposed classification style.

Looking at your quick list:

this seems good, but maybe we might want to refine "Searches and analyses" a bit? Also, going off of @dguest's original comment in the issue we should have sections for theory work as well. Thoughts?

At the risk of things getting to busy, we can even tag papers with badges indicating the type of machine learning used

Example:

bstienen commented 7 years ago

@matthewfeickert I agree that "searches and analyses" is quite a broad category and i am totally in favour of narrowing it down into multiple smaller ones. However, given the papers currently summed up in the repository i was not able to make a splitting that i was happy with, maybe somebody else can help in this.

About theory papers: i am not sure how this could be done best. Event generation of course is a purely theoretical category, but something like searches and analyses is more of a hybrid category. Maybe following @dguest's suggestion and making subcategories is a way to go, but what do we do then with hybrid papers i wonder... My suggestion would therefore be to not make subcategories for theory and experiment (it would only create problems in the long run) and let the category names speak for itself in whether or not the papers in it are purely theoretical / purely experimental / hybrid.

I do like the idea of the badges :smiley:!

matthewfeickert commented 5 years ago

This Issue is obviously very old, but given that it is not closed, I am noting that PR #50 will somewhat affect it.