Closed susannasiebert closed 4 years ago
Some relevant concepts here: https://bmccancer.biomedcentral.com/articles/10.1186/s12885-018-4325-6
@malachig @jhundal @chrisamiller I have implemented the framework that would use blastp to align the epitopes to the reference proteome and returns the number of alignments found. A couple of follow up things to discuss:
Good questions.
Initial thoughts:
Probably we just want to count one hit per gene. Otherwise the count returned will largely be a reflection of how many alternative isoforms involve a particular exon rather that how many places in the genome generate a match.
It would be good to have filtering on these matches be an option, perhaps with the default value to remove candidates that have >=1 match in the wild type proteome.
If we are generally filtering these out, then including in the condensed file is probably not needed, since the value will always be 0 there. On the other hand, if it is an option to allow them, then in that scenario we would want to see which candidates had matches and which did not. For now, I think we should add it to the condensed file. This will help us confirm that it is working as expected because at least for a while the clinical teams will continue to perform their own manual analysis for these matches. If there are discrepancies between what they do and our implementation here, this will help shine light on them.
I think we should gain some experience with the output first and decide on whether to include it in the ranking calculation later.
Decision from today's meeting:
Original issue: https://github.com/griffithlab/pVAC-Seq/issues/222
Take the peptide sequence and do some sort of similarity search against the ref proteome (BLAST) to see if the peptide naturally exists in the body. If so, it wouldn't be recognized as "foreign" antigen.