Closed JoaoRodrigues closed 3 years ago
Forgot to run tests... a lot of failures :)
EDIT: Now they all pass.
Merging #86 into master will decrease coverage by
0.02%
. The diff coverage is75.00%
.
@@ Coverage Diff @@
## master #86 +/- ##
==========================================
- Coverage 81.95% 81.92% -0.03%
==========================================
Files 46 46
Lines 3657 3663 +6
Branches 763 763
==========================================
+ Hits 2997 3001 +4
- Misses 468 470 +2
Partials 192 192
Impacted Files | Coverage Δ | |
---|---|---|
pdbtools/pdb_selres.py | 75.36% <75.00%> (-0.40%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7711fe1...046e281. Read the comment docs.
On another note. I don't know if concatenating lists of single string with itertools.chain is of any advantage compared to making a list with all lines and send the list around instead of the iterator. Why do you need an iterator that simulates a list? fh makes sense initially because you avoid the use of .readlines(), but, since you are creating the list when passing to iterchain in buffer, right now I can't see the benefit of that over a list. Because again, you are creating one list per line, instead of one single list with all the lines.
The idea is that you never build a list with all the lines, thus keeping memory use constant and quite low (you only keep one line in memory at any time). Otherwise why would we bother making an iterator at all?
I understand your purpose. But I cannot understand the codes actually does what you say, I may be wrong.
buffer
is returned on line 207 inside a tuple. The for
loop that creates buffer is not a generator, it is exhausted normally. I may be really missing it, but I truly believe the current code stores in memory a separate list for each line, and the buffer
iterator then points to those locations in memory. This is tricky Python already. If you are sure on how the code performs, go ahead. I can't see it right now, I need to set up a personal homework tasks :smile:
Fixed that text/bytes string issue. Thanks for pointing it out, I had some leftover stuff!
As for the generators. On line 177 buffer is defined as an empty iterator: buffer = iter([])
. We then iterate over fh, exhausting it, but we do buffer = iter_chain(buffer, [line])
at each step. This is a shortcut for itertools.chain()
which basically concatenates an existing iterator (buffer) with any iterable (in our case, [line]
). As such, we are growing buffer as an iterator. You could do the same thing like this, but it's uglier in my opinion:
def exhaust_and_regenerate(fh, resid_list):
for line in fh:
... # do stuff
resid_list.append(...) # this is global
yield line
global resid_list
resid_list = []
fh = exhaust_and_regenerate(fh, resid_list)
I see. I will see. Thanks for the example. I will run it separately and investigate it. It is a good learning exercise. I've been away from coding for more than 1 month, need to get back to shape, jeje.
Your last commits quite change a lot the initial PR, that is good. Everything seems working, I tested the problematic command.
Are you okay with the CI complaints? I don't want to merge without all green, but the complaints are very minor though :stuck_out_tongue: it might even be one of those codecov diff conflicts.
:+1:
Sorry for the long wait merging. As discusses with @JoaoRodrigues by :phone:, merging... :tada: everyone :smile:
Python's
sys.stdin
is not seekable apparently, so we cannot rewind the contents of the data after reading once. Running the script now one something likepdb_fetch 1ctf | pdb_selres -80:
will not return anything because the file is exhausted and the seek doesn't throw and error (at least on Windows). This PR fixes this by populating an iterator as we read the file the first time. Same low-memory behavior, slightly slower.