shashi / FileTrees.jl

Parallel computing with a tree of files metaphor
http://shashi.biz/FileTrees.jl
Other
88 stars 6 forks source link

`map` with lazy values #52

Open jkrumbiegel opened 3 years ago

jkrumbiegel commented 3 years ago

I am trying to create a pipeline where audio files are read in, then a transformation is done which creates multiple versions of each audio file, and these are then saved. So as not to keep all audio files in memory at once, I want to use the laziness feature. But it seems that map seems to operate directly on the Thunks in the closure, and I don't know how to proceed there. The docs say that mapvalues works with lazy values (they don't say that map doesn't) but I have to do return a FileTree with my new files, and that doesn't work with mapvalues.

DrChainsaw commented 3 years ago

Sorry for quick answer (I'm kinda jumping in here between tasks). If I understand you correctly you want the result of a computation to result in multiple files and to be able to do this lazily. If so, then I also have the same desire and I have made some experiments about it in #25.

My experience with this is that the problem looks simple on the surface, but it becomes orders of magnitude more difficult to reason about the filetree as it (at least for me) is immensely difficult to keep in mind when and how the nodes will materialize, depsite having spent some time with the package and its internals. The code in #25 works and does what it should and I do make use of it in some of my own projects, its just that whenever I return to that project and want to do something new I get surprised stuff doesn't work as I expected.

shashi commented 3 years ago

So since map gets the thunks, you can return FileTree where each file contains a different thunk which is a version of your audio file. The way to create a thunk from another thunk is, t2 = FileTrees.delayed(create_new_version)(t1) where create_new_version is a function which takes the value and returns the new value.

jkrumbiegel commented 3 years ago

Wouldn't that be good to have as a delayed(t) do file syntax then?