Closed iskandr closed 9 years ago
Reviewed 18 of 18 files at r1. Review status: 17 of 18 files reviewed at latest revision, 21 unresolved discussions, some commit checks failed.
.travis.yml, line 32 [r1] (raw file): Yeah, I finally concede that this is the better approach. Was going to change it myself!
test/data.py, line 41 [r1] (raw file):
This name feels too general. (Confused me when reading the tests, as I couldn't figure out what variants
was.) Maybe test_variants
or cancer_test_variants
? Or something? Same with the two names below?
test/data/tiny_test_ligandome_dir/A0201, line 1 [r1] (raw file): Maybe good to have at least two subdirectories to test that both are used?
Also, good to try a different allele format for the other subdirectory?
test/test_effect_expression_filters.py, line 52 [r1] (raw file): How about testing equal to threshold? <3 edge cases.
topiary/epitope_collection_helpers.py, line 90 [r1] (raw file):
Again, I think the x
is unhelpful for readability since it's not obvious what sort of class x
is.
topiary/epitope_prediction.py, line 32 [r1] (raw file):
Confusing to have one value that is actually a function of two other values in the tuple. What if this is just a normal class and mutant()
is a method? Alternatively, are the former two necessary to store here?
topiary/epitope_prediction.py, line 35 [r1] (raw file): Given that epitope predictions do still bind, I think EpitopeBindingPrediction would be more clear.
topiary/filters.py, line 71 [r1] (raw file):
Why isn't this called gene_id
?
topiary/filters.py, line 90 [r1] (raw file): Seems like more code re-use is possible between this and the function above it.
topiary/filters.py, line 132 [r1] (raw file): Are these tested?
topiary/filters.py, line 155 [r1] (raw file): Was hard for me to parse the double negatives in this sentence. Maybe: "If False, drop epitope predictions that aren't neoepitopes. In other words: drop predictions for epitopes that are not mutated or are in the self-ligandome." Accurate?
topiary/filters.py, line 158 [r1] (raw file): Just a copy paste from before, right?
topiary/mutant_epitope_predictor.py, line 30 [r1] (raw file):
Unchanged from the old extract_mutant_peptides
? Also, a dict of triples without any classes feels likes it's starting to become fairly complex. Why not just have result[effect] = seq[seq_start_offset:seq_end_offset]
, which appears to be what the caller is doing? (I imagine there is a reason, so I'm mostly just curious.)
Edit: oh, I guess you do use seq
and start_offset
in other places. I don't like how pervasive the triple is, though. Seems prone to error/not good for readability. Thoughts?
topiary/mutant_epitope_predictor.py, line 63 [r1] (raw file): Sneaky; why?
Edit: oh, because you want to keep the self-ligandome matches? This feels unintuitive to me. Thoughts on a separate keep_self_ligandome_epitopes
?
topiary/mutant_epitope_predictor.py, line 176 [r1] (raw file): Did you change any functionality here? Looks like just code re-arrangement?
topiary/mutant_epitope_predictor.py, line 182 [r1] (raw file):
Given the defaults of this method, it seems pretty easy to accidentally get top_priority_effects
when you actually want to use expression data by just forgetting to pass it in. Maybe you can remove the default? Or, other ideas? (High level comment: for software like this, I wonder how useful vs. dangerous some of these defaults really are.)
topiary/mutant_epitope_predictor.py, line 212 [r1] (raw file):
x
is not a helpful name! Took me a while to figure out what it was representing. binding_prediction
?
topiary/mutant_epitope_predictor.py, line 226 [r1] (raw file): Ahh! This seems super confusing. I think we should always avoid overwriting fields for subclasses/extended classes, since I think some people are bound to reasonably assume that the fields with the same name are equivalent between related classes.
Can you just call the updated source_sequence
and offset
something new? e.g. epitope_source_sequence
and epitope_offset
.
topiary/mutant_epitope_predictor.py, line 232 [r1] (raw file): Is this tested? I'm always concerned about off-by-1s.
topiary/mutant_epitope_predictor.py, line 248 [r1] (raw file):
epitope binding predictions
?
topiary/mutant_epitope_predictor.py, line 288 [r1] (raw file):
Just to make sure I understand: we've got epitopes_from_variants
and epitopes_from_mutation_effects
. The former uses the latter, though it pre-filters so we can do a lot less effect prediction. Each use slightly different logic for FPKM filtering, necessitating some duplication of code/tests, because the latter needs to get the variant of each effect while the former just uses the variant.
What are the use cases for people using epitopes_from_mutation_effects
and not being able to just start with epitopes_from_variants
?
Comments from the review on Reviewable.io
Review status: 17 of 18 files reviewed at latest revision, 17 unresolved discussions, some commit checks failed.
test/data.py, line 41 [r1] (raw file): :+1:
test/data/tiny_test_ligandome_dir/A0201, line 1 [r1] (raw file): Subdirectories or files?
topiary/epitope_prediction.py, line 32 [r1] (raw file):
I think it's important to know why the epitope was considered wildtype: because it didn't contain a mutant residue or because it did but that sequence occurs elsewhere in the ligandome. I was thinking of breaking away from using namedtuple
here, but that leads to a bigger refactoring that's probably best saved for a later PR.
topiary/epitope_prediction.py, line 35 [r1] (raw file): I think that an epitope, by definition, has to bind MHC so that seems redundant.
topiary/filters.py, line 71 [r1] (raw file): Copy paste in action :-)
topiary/filters.py, line 90 [r1] (raw file): I feel the same way but then you end up with much more cryptic code like:
expression_dict.get(feature_id, 0.0) > expression_threshold
for feature_id in getattr(x, feature_name)),
which seemed possibly worse than just duplicating but keeping things clearer.
topiary/filters.py, line 132 [r1] (raw file): No, adding tests
topiary/filters.py, line 155 [r1] (raw file): I'll change this description a little. How about: "If True, then all predicted epitopes are returned, even if they don't have a mutation or occur in the self ligandome. Otherwise, return only novel epitopes."?
topiary/filters.py, line 158 [r1] (raw file): Yeah, just pulling this out into its own testable chunks (and thus need to add tests)
topiary/mutant_epitope_predictor.py, line 30 [r1] (raw file):
I agree that it's ugly, not sure how to keep this data together otherwise though. I guess I could have something like ProteinSubsequence = namedtuple("ProteinSubsequence", "original_protein_sequence start end")
. Would that be better?
topiary/mutant_epitope_predictor.py, line 63 [r1] (raw file): I think that maybe
topiary/mutant_epitope_predictor.py, line 176 [r1] (raw file): Just code rearrangement. Conditionally defining a function and then using it seemed needlessly confusing.
topiary/mutant_epitope_predictor.py, line 182 [r1] (raw file): If you want expression-based transcript selection, you have to pass in something relating to transcript expression. I guess I can make that argument more prominent and remove its default value.
topiary/mutant_epitope_predictor.py, line 212 [r1] (raw file): Renamed
topiary/mutant_epitope_predictor.py, line 226 [r1] (raw file): The update is to remap the source_sequence and offsets onto the full protein. But maybe I can skip that? I don't know if the full protein sequence is actually necessary, I can possibly just keep the positions in the protein sequence in case someone needs to reconstruct it later.
topiary/mutant_epitope_predictor.py, line 232 [r1] (raw file): Good idea, I'm going to add them later to keep this PR manageable: https://github.com/hammerlab/topiary/issues/34
topiary/mutant_epitope_predictor.py, line 288 [r1] (raw file): They used to be one function which seemed cleaner when split apart, I can join them back together if you think that's better.
Comments from the review on Reviewable.io
Review status: 17 of 18 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
test/data/tiny_test_ligandome_dir/A0201, line 0 [r1] (raw file): I meant subdirectories, but files too? Maybe 1 subdir with 1 file and another subdir with 2 files?
topiary/epitope_prediction.py, line 0 [r1] (raw file):
I don't think it's obvious that EpitopePrediction
is related to BindingPrediction
. But I won't push.
topiary/epitope_prediction.py, line 0 [r1] (raw file):
I'd agree with breaking away from namedtuple
.
topiary/mutant_epitope_predictor.py, line 0 [r1] (raw file): Unclear. Just seemed like a lot of code/testing duplication. You can your judgement on this one.
topiary/mutant_epitope_predictor.py, line 0 [r1] (raw file): I think that makes sense :)
topiary/mutant_epitope_predictor.py, line 0 [r1] (raw file): :+1:
topiary/mutant_epitope_predictor.py, line 0 [r1] (raw file):
Offline discussion: we talked about what the default should be, since I was confused about why we were no longer dropping things. Alex clarified that's the point of the EpitopePrediction
class; once they get everything, they can filter it down as necessary.
topiary/mutant_epitope_predictor.py, line 0 [r1] (raw file): Not entirely sure what you mean; I'm happy as long as we don't change any of the variables already defined/set.
Comments from the review on Reviewable.io
Reviewed 1 of 1 files at r2. Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
Comments from the review on Reviewable.io
Review status: 5 of 27 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.
test/data/tiny_test_ligandome_dir/A0201, line 1 [r1] (raw file): I just made one with two files, not sure what testing multiple subdirectories adds.
Comments from the review on Reviewable.io
Awesome! I'm digging all the tests.
Reviewed 21 of 25 files at r3, 3 of 3 files at r4. Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.
setup.py, line 58 [r4] (raw file):
Should require the latest mhctools
since your test breaks without it.
test/test_epitope_filters.py, line 4 [r3] (raw file):
I guess this is easy enough that using mock
wouldn't add anything?
test/test_epitopes_from_commandline_args.py, line 10 [r3] (raw file): Given our issues of late, probably always good to throw in more than 1 allele?
topiary/epitope_prediction.py, line 26 [r4] (raw file): Finish your sentence?
topiary/epitope_prediction.py, line 28 [r4] (raw file): As discussed, these names should be worked on. (And the structure of the fields, too.)
Comments from the review on Reviewable.io
Yeah, also feels nice to break up monolithic piles of logic into smaller testable pieces. Good to merge?
Review status: 23 of 28 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.
setup.py, line 58 [r4] (raw file): Yeah, changed it to 0.1.8
test/test_epitope_filters.py, line 4 [r3] (raw file): I can't imagine anything that would add, but maybe I'm being unimaginative :-)
test/test_epitopes_from_commandline_args.py, line 10 [r3] (raw file): Add an HLA-C allele
Comments from the review on Reviewable.io
Merge away!
Reviewed 5 of 5 files at r5. Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.
Comments from the review on Reviewable.io
Hey Guys,
Question re: this branch. If I include the --only-novel-epitopes
flag do I need to maintain a local self-ligandome file, or are the self-ligandomes for each allele packaged w/ the topiary distribution? If it is the former case, when including --only-novel-epitopes
how should I point topiary to the correct self-ligandome file. If it is the latter, does topiary reference the appropriate self-ligandome automatically?
As always, thanks for all of your help!
Hey @JPFinnigan, I should have asked this before changing the commandline flags, but what do you prefer, this new behavior (keep all epitopes unless give --only-novel-epitopes) or what we had before (drop non-mutant or self epitopes unless given
--keep-wildtype-epitopes`)?
Also, Topiary doesn't come with a self-ligandome since ideally it would be matched to whatever binding prediction tool you're also using for mutant peptide-MHC binding. You can optionally supply one (like the NetMHCpan dataset Kipp generated) via --wildtype-ligandome-directory
.
@iskandr Good question.
I can think of reasons to support both default behaviors. I imagine that the typical topiary user will want use the software to identify neoantigens, and in that setting I would expect to drop all non-mutation/self epitopes by default. However, given that the complete filtering the user may want from --only-novel-epitopes
(e.g removing epitopes not containing the variant residue, and removing epitopes found elsewhere in the self-ligandome) requires additional input in the form of a --wildtype-ligandome-directory
(w/h some users may not have access to), I think the new behavior also makes sense and may in-fact actually be the preferable default behavior.
LazyLigandomeDict
which loads peptide sets from a directory with allele-specific peptide filesBindingPrediction
intoEpitopePrediction
with extra fields {contains_mutant_residues
,occurs_in_self_ligandome
,mutant
}filters