mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Fasta more dict-like behaviour #156

Closed Maarten-vd-Sande closed 4 years ago

Maarten-vd-Sande commented 4 years ago

It is a very minor thing, but now pyfaidx.Fasta behaves slightly more dictionary-like with .items and .values.

Maarten-vd-Sande commented 4 years ago

Should I worry about the CI/appveyor build failures?

mdshw5 commented 4 years ago

@Maarten-vd-Sande Thanks for the contribution. You shouldn't worry about the Appveyor errors - that integration hasn't been maintained, and I'll likely stop testing on Windows in the future (unless someone wants to figure out the Windows-specific quirks that are preventing the tests from running). One question I have is about the use case for this change. I have no doubt that making the Fasta class act more like a dict is a good idea, but I do not expect that most users will want Fasta.values() to return the IndexRecord instances. If I'm misunderstanding please let me know.

It seems like a more logical return value would be a list of FastaRecord instances, to match the behavior of Fasta.__contains__, Fasta.__iter__ and Fasta.__getitem__.

What I'm proposing then for Fasta.items() would be something like:

>>> from pyfaidx import Fasta
>>> x = Fasta('/Users/matt/Downloads/gencode_vM12.fa')
>>> tuple(zip(x.keys(), x))[:10]
(('dfam_5S', FastaRecord("dfam_5S")), ('dfam_7SK', FastaRecord("dfam_7SK")), ('dfam_7SLRNA', FastaRecord("dfam_7SLRNA")), ('dfam_ACRO1', FastaRecord("dfam_ACRO1")), ('dfam_ALR', FastaRecord("dfam_ALR")), ('dfam_ALRa', FastaRecord("dfam_ALRa")), ('dfam_ALRb', FastaRecord("dfam_ALRb")), ('dfam_AluJb', FastaRecord("dfam_AluJb")), ('dfam_AluJo', FastaRecord("dfam_AluJo")), ('dfam_AluJr', FastaRecord("dfam_AluJr")))
Maarten-vd-Sande commented 4 years ago

Thanks for your reply. You are absolutely right! I will change it.

Would you object if I change this to methods in the Fasta class instead of references to the odict instances? I think that's easier to implement, more pythonic, and it took me actually a minute to find where and how the keys() was implemented.

Something like:

class Fasta:
    ...
    def keys(self):
         return self.faidx.index.keys
    def values(self):
         return [fastarecord for fastarecord in self]
    def items(self):
        return tuple(zip(self.keys(), self.values()))
mdshw5 commented 4 years ago

@Maarten-vd-Sande Sounds good. I was thinking along the same lines, and fully support this change.

Maarten-vd-Sande commented 4 years ago

@mdshw5 I haven't really looked into the code thoroughly, but wouldn't it make more sense to just refer to Fasta.records, instead of this:

    def keys(self)
        return self.faidx.index.keys()

    def values(self):
        return [fastarecord for fastarecord in self]

    def items(self):
        return zip(self.keys(), self.values())

to this:

    def keys(self):
        return self.records.keys()

    def values(self):
        return self.records.values()

    def items(self):
        return self.records.items()

This should effectively do the same right? In my opinion the second option is much more clear.

mdshw5 commented 4 years ago

This makes sense, but I will refactor the section where Fasta.records is initialized so we can repurpose Fasta.keys:

https://github.com/mdshw5/pyfaidx/blob/f1668d2d0fde95892519eaa78a33bc1891e310cc/pyfaidx/__init__.py#L1000-L1006

Maarten-vd-Sande commented 4 years ago

Okay, nice! :tada:

Thanks for your time

Maarten-vd-Sande commented 4 years ago

@mdshw5 dont merge yet! The tests are failing for older python versions.

Maarten-vd-Sande commented 4 years ago

Perhaps related to dictionary ordering?

https://www.python.org/dev/peps/pep-0520/

mdshw5 commented 4 years ago

:) Yes, I was just looking into this.

Maarten-vd-Sande commented 4 years ago

I don't have time for it today, but I don't mind taking a look at it somewhere this or next week.

mdshw5 commented 4 years ago

I've tagged v0.5.7 for release.