Open macfreek opened 12 years ago
My opinion would be to remove the get_chunks
and rename iter_chunks
. I think usual usage is moving through the chunks, and not often needing caching of chunks for later access. The multiprocessing``Map-Reduce
is beyond what I know, so I can't really say about that.
A relative recent addition to NBT is world.py with the WorldFolder class. The expected use is for tools that iterate through all Chunks, without caring about the specific Region file.
A common complaint I hear is that NBT is slow. One way to speed things up is to process each region file using a different subprocess and combine the results (this would be a Map-Reduce pattern). The best way to implement this is using a callback function.
E.g.:
However, I fear that the term "mapreduce" is not well know with all programmers, and I'm looking for an easier name. Would the following be easier to understand?
The advantage is that the parallelisation can happen behind the scenes (though the multiprocessing.Pool class already makes it very easy).
The disadvantage is that it adds a third method to the existing
get_chunks
anditer_chunks
methods in the WorldFolder class. In addition, there probably also need aprocess_nbt
andprocess_regions
next toprocess_chunks
.In retrospect, the difference between
get_chunks
(which returns a list) anditer_chunks
(which returns an iterator) is so minor (iterators consume less memory, but lists can be cached) that it did not warrant the double function.I'm inclined to remove the cached
get_chunks
(though I liked the name better thaniter_chunks
).Any opinions?