open-life-science / ols-5

Creative Commons Attribution Share Alike 4.0 International
2 stars 14 forks source link

Building a Cloud-SPAN community of practice #4

Open evelyngreeves opened 2 years ago

evelyngreeves commented 2 years ago

Project Lead: @evelyngreeves Mentor: @annefou

Welcome to OLS-5! This issue will be used to track your project and progress during the program. Please use this checklist over the next few weeks as you start Open Life Science program :tada:.


Week 1 (week starting 28 February 2022): Meet your mentor!

Before Week 2 (week starting 7 March 2022): Cohort Call (Welcome to Open Life Science!)

Before Week 3 (week starting 14 March 2022): Meet your mentor!

Before Week 4: Cohort Call (Tooling and roadmapping for Open projects)

Week 5 and later

This issue is here to help you keep track of work as you start Open Life Science program. Please refer to the OLS-5 Syllabus for more detailed weekly notes and assignments past week 4.

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Week 13

Week 14

Week 15

evelyngreeves commented 2 years ago

Vision statement (first draft!): I’m working with the Cloud-SPAN team and environmental biotechnology community to develop training resources and build a community of practice to help early career researchers access the HPC resources needed for big data 'omics.

AleCandian commented 2 years ago

Hi @evelyngreeves, nice statement! I have a (maybe naïve) question: what is 'omics? :) Also I am curious: in my experience the access to HPC resources is granted through (often) competitive applications. Do you plan to create training for Early Career Researchers on how to write these proposals? Or are you focusing on training them on how to use HPC and optimising their software for this specific architecture?

evelyngreeves commented 2 years ago

Thanks @AleCandian, and good question! In this case "omics" is usually referring to "genomics" (aka studying/mapping DNA) but within biology it's also possible to study proteomics (proteins), transcriptomics (RNA), metabolomics (metabolites) etc. All of the "omics" involve large datasets and some similar analysis techniques, which is why they are often grouped together. They can also be linked! So a researcher might analyse multiple omics datasets to understand their subject better.

At the moment the focus is on using HPC and optimising software. We're using cloud-based containerised instances via Amazon Web Services, so the application process is less relevant. Lots of our target audience will only need to perform a discrete set of analyses using HPC (as part of a much wider data analysis workflow), so the cloud is a good option for them. We are hoping to extend our training to include institutional/regional HPC clusters too, so writing proposals is a very good point to consider - thank you 😄 Although, it's worth noting that many institutions in the UK have their own HPC cluster (or share one with other institutions locally) which have a minimal application process for their own researchers and are generally large enough for basic analyses.

Thank you for your feedback! I'll check yours out soon.

evelyngreeves commented 2 years ago

Here's a link to my open canvas for the project: https://docs.google.com/presentation/d/1o7i8A8vGHuhCyfM_I9xKuYh5b8nHk6ZlCZuidMMm8uI/edit?usp=sharing

Would love to hear others' feedback on it!