hackseq / hackseq18

hackseq 2018 - [*]-omics hackathon. Vancouver, BC
The Unlicense
7 stars 0 forks source link

Team Lead + Project info for participant promotion #20

Closed NoushinN closed 6 years ago

NoushinN commented 6 years ago

Hi Everyone, Baraa and I interviewed Sam Chorlton last week and as Baraa posted, we thought it was a great project, needing more refinement at the team lead level. Sam got back to us filling all these fields so we probably won't need to contact him again for more info. His response is pasted here and please also note his optional abstract graphics! :)

Listed Team Leader. Optional: contact email, link to website

Sam Chorlton sam.chorlton@pm.me https://github.com/chorltsd/REUSE

The Hook. In two sentences advertise the goal of your project and why it would be fun to join this particular team

Help change the world by filtering unneeded sequences from a next-generation sequencing dataset, enriching signal from noise and enabling rapid pathogen discovery, isolation of sequence types (eg. rRNA), contaminant removal and more.

Project Abstract: (~300 words) Should have been completed with the application but may be revised. Detail what the project is about and what you're going to do.

Filtering unwanted sequences from nucleic acid sequencing data is an important step in many analyses. It has been used to remove technical artefacts (eg. PhiX), discover known and novel pathogens, isolate nucleic acid types (eg rRNA), and remove noise in metagenomic studies. This step significantly improves the speed and quality of subsequent analyses.

Here I propose an end-to-end pipeline (REUSE) for Rapidly Eliminating Unwanted SEquences from large sequencing datasets. The result of REUSE will be sequences that do not belong to a reference sequence. This pipeline will be based on previously established techniques for isolating known and novel pathogens among sequencing data. It will seek to dramatically speed up the process, optimize flaws in other pipelines, and automate it from start to finish. It will likely include a k-mer filter, read alignment, read assembly, and contig alignment. Some of these steps will be based on publicly available tools, such as RNA-STAR and Trinity, whereas others will need to be programmed from the ground up.

The work at HackSeq18 will focus on development of the most novel and needed module, the k-mer filter (k-REUSE). Previous evidence indicates that k-mers can be used to rapidly screen and filter sequences, and that a k-mer of 21 basepairs is sufficient to discriminate between unrelated species.(1) Currently published applications, such as Kontaminant(1), Cookiecutter(2), BBDuk(3) and others have several limitations, including lack of parallelization, high memory requirements (>50gb for the human genome), and lack of ability to save the reference index to disk. Other techniques, such a read alignment, are too slow to use on large datasets.

The goal of HackSeq18 will be the development of k-REUSE and comparison to other filters. Further development will likely be needed after the hackathon for integration of k-REUSE into the complete REUSE pipeline and ultimate application to extremely large datasets.

  1. Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, et al. Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data. PloS One. 2015;10(6):e0129059.

  2. Starostina E, Tamazian G, Dobrynin P, O’Brien S, Komissarov A. Cookiecutter: a tool for kmer-based read filtering and extraction. bioRxiv. 2015 Aug 16;024679.

  3. Bushnell B. BBTools [Internet]. DOE Joint Genome Institute. [cited 2018 Jul 25]. Available from: https://jgi.doe.gov/data-and-tools/bbtools/

Required Skills: What skills or knowledge is required a priori from the participants to be able to take part on this team

-Strong skills in a fast programming language such as C or C++, or the ability to reference fast libraries from within python or another language. Team members will be needed to program the k-mer based filter. -If you have the above skills, you likely have Github skills. -Understanding the basics of next generation sequencing data and formats (eg FASTA, FASTQ). -Ability to have fun.

Optional Skills: What skills or knowledge would be beneficial for the participants to have but is not necessary to take part.

-Overview of existing pathogen discovery pipelines (RINS, SURPI, Kontaminant, Cookiecutter, DeconSeq). -Ability to write (particularly science). -Ability to science (particularly to evaluate our pipeline in comparison with existing pipelines, generate comparative graphs and tables, and perform basic statistics). abstract graphic

NoushinN commented 6 years ago

Also, please find the github page for my proposal here: https://github.com/NoushinN/anatomy-of-morbidity

alexsweeten commented 6 years ago

Listed Team Leader. Optional: contact email, link to website

The Hook. In two sentences advertise the goal of your project and why it would be fun to join this particular team

Project Abstract: (~300 words) Should have been completed with the application but may be revised. Detail what the project is about and what you're going to do.

Required Skills: What skills or knowledge is required a priori from the participants to be able to take part on this team

Optional Skills: What skills or knowledge would be beneficial for the participants to have but is not necessary to take part.

Image: beri-hex

alexsweeten commented 6 years ago

Listed Team Leader. Optional: contact email, link to website

The Hook. In two sentences advertise the goal of your project and why it would be fun to join this particular team

Project Abstract: (~300 words) Should have been completed with the application but may be revised. Detail what the project is about and what you're going to do.

Required Skills: What skills or knowledge is required a priori from the participants to be able to take part on this team

Optional Skills: What skills or knowledge would be beneficial for the participants to have but is not necessary to take part.

Image:

klgray25 commented 6 years ago

Thanks Alex!

NoushinN commented 6 years ago

Hi @klgray25, Cara and Morgan sent this to us (Sasha and I) today. Cara is still looking for he initial application to make adjustments but below is what we could use for now:

Project Title Simulating Transcriptome Structural Variants to Produce Benchmarking Datasets

Team Leader(s) Cara Reisle (caralynreisle@gmail.com) & Morgan Bye (morgan@morganbye.com)

The Hook While there are tools to simulate structural variants in genomic data, currently no such applications exist for transcriptomes. Simulated data is required to properly evaluate structural variant callers.

Project Abstract The goal of this project is to produce an application that is able to simulate structural variants in transcriptome data. The application will be written in python. It will be a pipeline consisting of several steps Specification of the structural variants to be simulated Generation of the resulting transcriptome sequence Simulating paired-end reads covering the structural variant breakpoints Alignment of the simulated reads to the reference genome to produce a BAM file

Required Skills Python (1-star) General programming skills (2-stars) Basic Background knowledge of Genetics and Sequencing (1-star)

Optional Skills (Helpful but not required) Knowledge of structural variants Familiarity with BAM/SAM file format Familiarity with pysam python module

Recommended Reading If you’re new to structural variants or working with sequencing data, it would be a good idea to do a bit of background reading. I’ve listed a some potentially relevant literature below. deFuse (PMID 21625565) Chimerascan (PMID 21840877) STAR (PMID 26334920) BAMSurgeon (PMID 25984700) Flux Simulator (PMID 22962361) MAVIS (PMID 30016509) SQUID (PMID 29650026)

jenjaelin commented 6 years ago

@klgray25 I got this back from Emma (Veena) today:

Project Title: Blockchain and Infectious Diseases Listed Team Leader: vghorakavi@gmail.com The Hook: Infectious diseases are widespread and are difficult to track. Blockchain has the capabilities to track infectious diseases without revealing information about the people impacted and with location data.
Project Abstract: Please use what I submitted before Required Skills: Python: 1, Blockchain: 0, Epidemiology: 1 Optional Skills: Not provided

klgray25 commented 6 years ago

Okay, I'll update the hook for her project! It's a little too late to add another skill requirement though, as that would mess up the response form. I won't be able to add Epidemiology.