jpivarski-talks / 2023-07-24-tac-hep-tutorial

BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

What do you want to learn about? #1

Open jpivarski opened 1 year ago

jpivarski commented 1 year ago

I've been asked to present a tutorial to TAC-HEP and WATCHEP trainees, but I'd like to get a sense of your level and topics of interest, first. Let me know in this issue thread what you want to learn about and how much you know already.

To get a sense of what my tutorials are usually like, see my previous ones.

The date is still to be determined, the duration is anywhere between 2 and 6 hours, and it is likely to be over Zoom (if not somehow connected with CoDaS-HEP, but we're all already busy then).

If you want to try something outside my normal set of topics, I'm definitely open to it, but we'll iterate here, in this issue thread, to converge on something realistic.

JasonNielsen commented 1 year ago

Hi, Jim,

I just finished touching base with all of the WATCHEP trainees to get a list of possible topics. I'll dump them here and let everyone comment, upvote, etc.

Other topics that may or may not be in your wheelhouse:

skkwan commented 1 year ago

Hello Jim, all,

I'm in a grad student in CMS, in TAC-HEP, and would be interested in Dask.

I have not used awkward arrays, focusing on using RDataFrame and C++ for a physics analysis, but I'm very curious about best practices for generating many (~hundreds of) histograms that are slightly different, like for systematics in an analysis. But since not everyone is in collider physics maybe this is too niche.

mirarenee commented 1 year ago

Hi Jim,

I'm a grad student in cosmology in the TAC-HEP program. I'd be interested in accelerating Python through any of the methods above, as well as array-oriented programming.

If this is in your area, I'd also be interested in learning about Git more thoroughly.

jpivarski commented 1 year ago

Thanks for the suggestions so far!

All in all, the above is more than I could cover in 6 hours, but we can keep brainstorming before we have to set priorities.

[^1]: Just enough to git around.

jmw464 commented 1 year ago

Hi Jim,

I'm a grad student in ATLAS (in WATCHEP) and I think the main thing I'd be interested in is some more advanced ways to speed up python (Dask and Numba tutorials sound great to me). I've worked with awkward before but I would also support doing some more things related to array programming. Ideally I'd want to take away some things to speed up my data processing pipelines. Thanks for putting this together!

Nanoemc commented 1 year ago

Hi Jim!

I am a graduate student working on CMS. I have an analysis I am working on, and I work on the Elastic Analysis Facility (EAF) at Fermilab. I have moderate python experience, I usually just look stuff up whenever I need it. I have experience with C++, Coffea, git, and ROOT (although it's quite rusty). I have a little experience with docker, helm, and kubernetes (very little).

I am interested in learning more about the following:

I'd also be interested in learning about ADLs or machine learning. I have also taken a software course that went over object oriented python, CI/CD, and tests but I wouldn't mind going over some CI/CD or tests topics again.

jpivarski commented 1 year ago

Okay, another +1 for Dask; I should definitely involve Dask and dask-awkward in a presentation of columnar analysis.

I felt confident to talk about machine learning in 2015, but much less so today.

The trouble with a topic like "advanced Python" is that I don't know how to tie it together into a coherent story. Henry does a good job of it with Level Up Your Python, which I highly recommend. I'll keep thinking about it, though.

Thanks for the suggestions!

twnelson0 commented 1 year ago

Hi Jim, I am a CMS graduate student in the TAC-HEP program. I have a fairly reasonable amount of experience with C++ and python. Of the topics you listed I would be most interested in learning about array/columnar techniques, dask and numba.

rpsimeon34 commented 1 year ago

Hi Jim, Continuing a theme; I'm also a CMS grad student in TAC-HEP, and would be very interested in Dask/numba and array/columnar techniques. I'll also throw Kubernetes and Condor out there, but we already have a pretty sizable list of interesting sounding topics going. Thank you!

jpivarski commented 1 year ago

Thanks for the suggestion! I'll add a +1 to Dask/Numba (it's looking like that's going to be the core of what I'll talk about).

I'm not really qualified to teach Kubernetes and Condor, though. (Sorry! But it was worth suggesting.)

mattkwiecien commented 1 year ago

Howdy Jim! I'm one of the Matts participating in WATCHEP, but not in collider physics! I'm working on some of the software infrastructure and analysis pipelines within LSST DESC.

Out of the topics listed above, discussion of how to include specifically GPU parallelization into our development would be helpful. Maybe best practices or design patterns within python which best allow for GPU parallelized code (if any exist!).

I think this would go into the category of interfacing python with C++ (i.e. have my python call some lower level GPU code to perform some numerically intense integrations). Or maybe this falls into the dask category discussed above?

Thanks and looking forward to the school!

jpivarski commented 1 year ago

Thanks!

I now know that this traineeship summer school will be July 24‒28. I'll be teaching on July 24 (Monday) and will be helping out with a coding jam that extends over the whole week. (I'll have limited availability after Monday, since I'll be convening another workshop at the same time.)

From the above, a central theme on columnar analysis, vertical scale-out with Numba and horizontal scale-out with Dask kept coming up, so I'll focus on that. I'll be presenting material on columnar analysis in general at CoDaS-HEP, which many of you will be at, so I can use the two 1.5 hour blocks I'll have on July 24 for vertical and horizontal scale-out.

Second to the above, unit tests/CI and git practices also came up fairly often. I think we should integrate those into the coding jam and I'll be talking with the main developer of those exercises tomorrow.

@mattkwiecien, GPU parallelization would be an interesting topic, but it would involve more set-up (getting shared resources with GPUs on the day of the training—which is not insurmountable) and hasn't been a central theme of the requests. Maybe I can do some of the Numba examples with CUDA (Numba has a CUDA backend), but the main thing I should point you toward is CuPy, if you haven't already heard of it, and its ability to JIT-compile kernels in particular. That gives you a nice Python-C++ interface that's directly focused on GPUs.