Closed jrycw closed 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 85.98%. Comparing base (
e3649d2
) to head (b2a98e0
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
snakeviz
From a performance perspective, using an iterator
might not offer significant gains. However, there are notable benefits. We avoid the need to convert data structures into lists
just for indexing with []
. By accepting any iterable
, we maintain flexibility, a concept I learned from the itertools
module. This approach can also save memory since we don't need to materialize everything into a list, even though memory consumption might not be a major concern for processing small data in our use case. By the way, thanks for introducing me to snakeviz
, another new and useful package.
These are all fair points, especially that memory usage tends to be better with iterators. I definitely think there will be cases where iterators lead to big improvements. The one thing I feel cautious about when considering iterators, is that developers tend to use lists, even in situations when they could have used iterators.
I think there's likely something about lists that people find a bit easier to understand, that tends to push them in that direction (whether it's the ease of seeing the data when debugging vs a stream, or something else). It seems to be on the desire-path for developers.
In any event, at this point I think @rich-iannone and I trust your judgement on which pieces could be improved by iterators. So definitely don't take this as discouraging changing anything where it seems like they'd be useful :)
@machow, thanks for sharing your thoughts. I completely agree with your points. I would aim for a function that:
Under these conditions, it might be a good candidate to take an iterator as input and return an iterator as the output.
Our project is progressing steadily, and it's been an enjoyable journey as one of the contributors. However, I've noticed that we might be overly reliant on the
list
type in the codebase. Whilelist
is excellent and suitable for prototyping, it can become problematic as the codebase grows. Therefore, I propose replacinglist
withGenerator
where possible.I have chosen
seq_groups()
as a starting point since it is currently functioning as a kind ofGenerator
. Currently, we request the input as alist
, but I believe we can improve this by refactoring it to accept anIterable
as input.I hope this PR makes sense, and I welcome input from the team on this concept.