openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

Generalized ReferenceTranscripts into SequenceData, added protein sequences #70

Closed iskandr closed 9 years ago

iskandr commented 9 years ago

Review on Reviewable

tavinathanson commented 9 years ago

Comments from the review on Reviewable.io

Reviewed files:


pyensembl/common.py, line 19 [r1] (raw file): Weird that there isn't an obvious library for this!


pyensembl/database.py, line 401 [r3] (raw file): Comment saying what required means?


pyensembl/ensembl_release.py, line 74 [r3] (raw file): Minor, but to reduce complexity and increase readability, what about SequenceData doing the fasta_url import and call itself?


pyensembl/ensembl_release.py, line 208 [r3] (raw file): How about making an is_valid_transcript_id/is_valid_protein_id in some common area? Seems generally useful.


pyensembl/gene.py, line 80 [r3] (raw file): ensembl_release doesn't have __eq__?


pyensembl/sequence_data.py, line 1 [r3] (raw file): I still think it'd be easier for reviews if renames are done separately. So, for example, one commit would use reference_transcripts (but use it as if it's sequence_data), and then another commit would literally just do the rename.


pyensembl/sequence_data.py, line 51 [r3] (raw file): That's just awful.

I think this would be more readable if extracted out into self.local_filename = add_release_to_filename_if_not_there or something


pyensembl/sequence_data.py, line 94 [r3] (raw file): So this is better than just self.local_filename == other.local_filename? Curious what your thoughts are there. I guess it depends on whether we view equality as a pre-lazy loading or post-lazy loading property?


pyensembl/sequence_data.py, line 153 [r3] (raw file): True and False are no longer necessary (and the associated comment).


pyensembl/sequence_data.py, line 224 [r3] (raw file): I think this can be extracted out into the same place as is_valid_protein_id, etc.


pyensembl/sequence_data.py, line 233 [r3] (raw file): Maybe a bit confusing that clear_cache doesn't do anything with this in memory cache, even though they're entirely different?


pyensembl/transcript.py, line 78 [r3] (raw file): Curious about what inspired you to do all this if/else/error re-ordering in multiple files.


pyensembl/transcript.py, line 177 [r3] (raw file): Why is this a TODO, now that you have the decorator?


iskandr commented 9 years ago

Comments from the review on Reviewable.io


pyensembl/common.py, line 19 [r1] (raw file): Yeah, I tried to find one and they all do too much (e.g. use __repr__ for cache keys, have timeouts and cache limits, serialize to disk, &c). I'd use lru_cache if it ever got back-ported from Python 3.x!


pyensembl/ensembl_release.py, line 74 [r3] (raw file): So, pass species, server and sequence_type to SequenceData? I was considering that but the class seems simpler/less magical if it's just given a FASTA URL and expected to download/index that particular FASTA (all the Ensembl specific stuff stays in EnsemblRelease)


pyensembl/ensembl_release.py, line 208 [r3] (raw file): Made require_human_transcript_id and require_human_protein_id in common


pyensembl/gene.py, line 80 [r3] (raw file): Weird, I though it did! Just added one.


pyensembl/sequence_data.py, line 1 [r3] (raw file): Well, it's a rename and a change of functionality. It wouldn't really make sense for ReferenceTranscripts to also be used for protein sequences. Not sure if there's a functional intermediate state we'd want to be in after a single smaller PR


pyensembl/sequence_data.py, line 94 [r3] (raw file): I guess you might want to compare whether two SequenceData objects wrap the same sequences even if you haven't downloaded them yet. Changing this.


pyensembl/sequence_data.py, line 233 [r3] (raw file): That's a bug!


pyensembl/transcript.py, line 78 [r3] (raw file): I'd rather see the default (not-error) case first, makes the logic easier for me to follow


pyensembl/transcript.py, line 177 [r3] (raw file): I added the decorator and forgot to come back, fixed


tavinathanson commented 9 years ago

Comments from the review on Reviewable.io

Reviewed files:


Merge away :+1:


pyensembl/ensembl_release.py, line 75 [r3] (raw file): Well, my understanding is that it already has that information via getting passed ensembl_release, though that feels like abuse. In any event point about less magic in SequenceData seems valid! :+1:


pyensembl/sequence_data.py, line 1 [r3] (raw file): Yup, that's why I said 1 commit (within a larger PR) vs. 1 PR. (That would let me diff that commit before it gets renamed.)


tavinathanson commented 9 years ago

Comments from the review on Reviewable.io


(Wait, I retract that. 1 second.)


tavinathanson commented 9 years ago

Comments from the review on Reviewable.io


Okay, merge away for real :)


iskandr commented 9 years ago

Comments from the review on Reviewable.io


pyensembl/sequence_data.py, line 1 [r3] (raw file): Oh! Sorry, I misunderstood. Gotcha.


coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 4073c3b6d586551c247b253fe8009e7c79abc7e5 on protein-sequences into \ on master**.