Closed hacobe closed 6 months ago
Thank you very much for pointing the incongruity, @hacobe - I have fixed it here.
But basically these are 2 different opinions by 2 different people. I moved Ross' suggestions to incoming so that I could integrate them properly later. I shouldn't have dumped them into the main text as is.
Bottom line is that I am yet to find a good streaming solution and that's my experience. Ross seems to have had a working streaming solution, but we have been doing very different things, so possibly both are possible.
Thanks!
(1) and (2) seem to express different opinions:
1) In the "3 Machine Learning IO needs" section, one of the bullet points under "Incoming suggestions from Ross Wightman to integrate" is "Note that once your datasets are optimally friendly for a large, distributed network filesystem, they can usually just be streamed from bucket storage in cloud systems that have that option. So better to move them off the network filesystem in that case."
2) The section "Local storage beats cloud storage" starts with "While cloud storage is cheaper the whole idea of fetching and processing your training data stream dynamically at training time is very problematic with a huge number of issues around it...It’s so much better to have enough disk space locally for data loading."
What am I missing?