mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Insert indels using mutable option #214

Closed anderdnavarro closed 7 months ago

anderdnavarro commented 11 months ago

Hi!

I was wondering if there would be a possibility to allow indel insertions using the mutable option while reading the fasta. I suppose the difficulty in implementing this feature is that if the length of the sequence changes, the index doesn't work anymore, is this correct?

Could reindexing the new fasta and reloading it automatically if the mutation is an indel be a solution? It's not the most elegant or the fastest, but I can't think of any other solution at the moment.

Thank you very much!! Ander

mdshw5 commented 11 months ago

Hey @anderdnavarro. Yes, my initial reason for excluding insertions and deletions is that these would invalidate the sequence index as well as the line lengths if the sequences are line wrapped. Additionally because the sequences are manipulated on disk, all sequence following a length-changing operation would have to be read into memory and written with a different byte offset. I think that these operations would be much more efficiently implemented in-memory in which case you do not need the benefit of an indexed on-disk format.

anderdnavarro commented 11 months ago

Hi @mdshw5,

Thank you very much for your clarification! I also thought the other option was to do it in memory. I will work on a function to do that, and if I see that it is efficient, I can share it with you if you want it.