Closed iskandr closed 9 years ago
Comments from the review on Reviewable.io
Reviewed files:
pyensembl/common.py, line 19 [r1] (raw file): Weird that there isn't an obvious library for this!
pyensembl/database.py, line 401 [r3] (raw file):
Comment saying what required
means?
pyensembl/ensembl_release.py, line 74 [r3] (raw file):
Minor, but to reduce complexity and increase readability, what about SequenceData
doing the fasta_url
import and call itself?
pyensembl/ensembl_release.py, line 208 [r3] (raw file):
How about making an is_valid_transcript_id
/is_valid_protein_id
in some common area? Seems generally useful.
pyensembl/gene.py, line 80 [r3] (raw file):
ensembl_release
doesn't have __eq__
?
pyensembl/sequence_data.py, line 1 [r3] (raw file): I still think it'd be easier for reviews if renames are done separately. So, for example, one commit would use reference_transcripts (but use it as if it's sequence_data), and then another commit would literally just do the rename.
pyensembl/sequence_data.py, line 51 [r3] (raw file): That's just awful.
I think this would be more readable if extracted out into self.local_filename = add_release_to_filename_if_not_there
or something
pyensembl/sequence_data.py, line 94 [r3] (raw file):
So this is better than just self.local_filename == other.local_filename
? Curious what your thoughts are there. I guess it depends on whether we view equality as a pre-lazy loading or post-lazy loading property?
pyensembl/sequence_data.py, line 153 [r3] (raw file):
True
and False
are no longer necessary (and the associated comment).
pyensembl/sequence_data.py, line 224 [r3] (raw file):
I think this can be extracted out into the same place as is_valid_protein_id
, etc.
pyensembl/sequence_data.py, line 233 [r3] (raw file):
Maybe a bit confusing that clear_cache
doesn't do anything with this in memory cache, even though they're entirely different?
pyensembl/transcript.py, line 78 [r3] (raw file):
Curious about what inspired you to do all this if/else/error
re-ordering in multiple files.
pyensembl/transcript.py, line 177 [r3] (raw file): Why is this a TODO, now that you have the decorator?
Comments from the review on Reviewable.io
pyensembl/common.py, line 19 [r1] (raw file):
Yeah, I tried to find one and they all do too much (e.g. use __repr__
for cache keys, have timeouts and cache limits, serialize to disk, &c). I'd use lru_cache
if it ever got back-ported from Python 3.x!
pyensembl/ensembl_release.py, line 74 [r3] (raw file):
So, pass species
, server
and sequence_type
to SequenceData
? I was considering that but the class seems simpler/less magical if it's just given a FASTA URL and expected to download/index that particular FASTA (all the Ensembl specific stuff stays in EnsemblRelease
)
pyensembl/ensembl_release.py, line 208 [r3] (raw file):
Made require_human_transcript_id
and require_human_protein_id
in common
pyensembl/gene.py, line 80 [r3] (raw file): Weird, I though it did! Just added one.
pyensembl/sequence_data.py, line 1 [r3] (raw file):
Well, it's a rename and a change of functionality. It wouldn't really make sense for ReferenceTranscripts
to also be used for protein sequences. Not sure if there's a functional intermediate state we'd want to be in after a single smaller PR
pyensembl/sequence_data.py, line 94 [r3] (raw file): I guess you might want to compare whether two SequenceData objects wrap the same sequences even if you haven't downloaded them yet. Changing this.
pyensembl/sequence_data.py, line 233 [r3] (raw file): That's a bug!
pyensembl/transcript.py, line 78 [r3] (raw file): I'd rather see the default (not-error) case first, makes the logic easier for me to follow
pyensembl/transcript.py, line 177 [r3] (raw file): I added the decorator and forgot to come back, fixed
Comments from the review on Reviewable.io
Reviewed files:
Merge away :+1:
pyensembl/ensembl_release.py, line 75 [r3] (raw file):
Well, my understanding is that it already has that information via getting passed ensembl_release
, though that feels like abuse. In any event point about less magic in SequenceData seems valid! :+1:
pyensembl/sequence_data.py, line 1 [r3] (raw file): Yup, that's why I said 1 commit (within a larger PR) vs. 1 PR. (That would let me diff that commit before it gets renamed.)
Comments from the review on Reviewable.io
pyensembl/sequence_data.py, line 1 [r3] (raw file): Oh! Sorry, I misunderstood. Gotcha.
ReferenceTranscripts
with a more generalSequenceData
class that gives us access to both transcript (cDNA) & protein (amino acid) sequencesGene
&Transcript
objects by giving themEnsemblRelease
objects directly. This creates an implicit circular dependency (sinceGene
andTranscript
are created byEnsemblRelease
) but greatly simplifies access to both the annotation DB and sequence DB from within genes & transcripts.url_templates.fasta_url
.protein_id
andprotein_sequence
.test_ensembl_releases()