Open jpivarski opened 1 year ago
Hi, Jim,
I just finished touching base with all of the WATCHEP trainees to get a list of possible topics. I'll dump them here and let everyone comment, upvote, etc.
Other topics that may or may not be in your wheelhouse:
Hello Jim, all,
I'm in a grad student in CMS, in TAC-HEP, and would be interested in Dask.
I have not used awkward arrays, focusing on using RDataFrame and C++ for a physics analysis, but I'm very curious about best practices for generating many (~hundreds of) histograms that are slightly different, like for systematics in an analysis. But since not everyone is in collider physics maybe this is too niche.
Hi Jim,
I'm a grad student in cosmology in the TAC-HEP program. I'd be interested in accelerating Python through any of the methods above, as well as array-oriented programming.
If this is in your area, I'd also be interested in learning about Git more thoroughly.
Thanks for the suggestions so far!
git
commands[^1], less than the HSF/Software Carpentry tutorial on git, but it has been enough to be productive. I would have more to say on issue/PR/release workflows—the GitHub or GitLab features—and maybe this could be combined with a sample project that also demonstrates automated tests? I'd have to think of what that sample project could look like, and make sure that it doesn't take too much time.Vary
is a nice solution to this problem. Maybe we should have some content on the Awkward Array ←→ RDataFrame interface?All in all, the above is more than I could cover in 6 hours, but we can keep brainstorming before we have to set priorities.
[^1]: Just enough to git around.
Hi Jim,
I'm a grad student in ATLAS (in WATCHEP) and I think the main thing I'd be interested in is some more advanced ways to speed up python (Dask and Numba tutorials sound great to me). I've worked with awkward before but I would also support doing some more things related to array programming. Ideally I'd want to take away some things to speed up my data processing pipelines. Thanks for putting this together!
Hi Jim!
I am a graduate student working on CMS. I have an analysis I am working on, and I work on the Elastic Analysis Facility (EAF) at Fermilab. I have moderate python experience, I usually just look stuff up whenever I need it. I have experience with C++, Coffea, git, and ROOT (although it's quite rusty). I have a little experience with docker, helm, and kubernetes (very little).
I am interested in learning more about the following:
I'd also be interested in learning about ADLs or machine learning. I have also taken a software course that went over object oriented python, CI/CD, and tests but I wouldn't mind going over some CI/CD or tests topics again.
Okay, another +1 for Dask; I should definitely involve Dask and dask-awkward in a presentation of columnar analysis.
I felt confident to talk about machine learning in 2015, but much less so today.
The trouble with a topic like "advanced Python" is that I don't know how to tie it together into a coherent story. Henry does a good job of it with Level Up Your Python, which I highly recommend. I'll keep thinking about it, though.
Thanks for the suggestions!
Hi Jim, I am a CMS graduate student in the TAC-HEP program. I have a fairly reasonable amount of experience with C++ and python. Of the topics you listed I would be most interested in learning about array/columnar techniques, dask and numba.
Hi Jim, Continuing a theme; I'm also a CMS grad student in TAC-HEP, and would be very interested in Dask/numba and array/columnar techniques. I'll also throw Kubernetes and Condor out there, but we already have a pretty sizable list of interesting sounding topics going. Thank you!
Thanks for the suggestion! I'll add a +1 to Dask/Numba (it's looking like that's going to be the core of what I'll talk about).
I'm not really qualified to teach Kubernetes and Condor, though. (Sorry! But it was worth suggesting.)
Howdy Jim! I'm one of the Matts participating in WATCHEP, but not in collider physics! I'm working on some of the software infrastructure and analysis pipelines within LSST DESC.
Out of the topics listed above, discussion of how to include specifically GPU parallelization into our development would be helpful. Maybe best practices or design patterns within python which best allow for GPU parallelized code (if any exist!).
I think this would go into the category of interfacing python with C++ (i.e. have my python call some lower level GPU code to perform some numerically intense integrations). Or maybe this falls into the dask category discussed above?
Thanks and looking forward to the school!
Thanks!
I now know that this traineeship summer school will be July 24‒28. I'll be teaching on July 24 (Monday) and will be helping out with a coding jam that extends over the whole week. (I'll have limited availability after Monday, since I'll be convening another workshop at the same time.)
From the above, a central theme on columnar analysis, vertical scale-out with Numba and horizontal scale-out with Dask kept coming up, so I'll focus on that. I'll be presenting material on columnar analysis in general at CoDaS-HEP, which many of you will be at, so I can use the two 1.5 hour blocks I'll have on July 24 for vertical and horizontal scale-out.
Second to the above, unit tests/CI and git practices also came up fairly often. I think we should integrate those into the coding jam and I'll be talking with the main developer of those exercises tomorrow.
@mattkwiecien, GPU parallelization would be an interesting topic, but it would involve more set-up (getting shared resources with GPUs on the day of the training—which is not insurmountable) and hasn't been a central theme of the requests. Maybe I can do some of the Numba examples with CUDA (Numba has a CUDA backend), but the main thing I should point you toward is CuPy, if you haven't already heard of it, and its ability to JIT-compile kernels in particular. That gives you a nice Python-C++ interface that's directly focused on GPUs.
I've been asked to present a tutorial to TAC-HEP and WATCHEP trainees, but I'd like to get a sense of your level and topics of interest, first. Let me know in this issue thread what you want to learn about and how much you know already.
To get a sense of what my tutorials are usually like, see my previous ones.
The date is still to be determined, the duration is anywhere between 2 and 6 hours, and it is likely to be over Zoom (if not somehow connected with CoDaS-HEP, but we're all already busy then).
If you want to try something outside my normal set of topics, I'm definitely open to it, but we'll iterate here, in this issue thread, to converge on something realistic.