Open rom1504 opened 2 years ago
With new information I gathered, the more important thing here would be to make it as easy as possible to make img2dataset usable in a swarm environnement rather than a cluster: many varied kind of nodes connecting and helping out for a while then disconnecting. This is already kind of working thanks to spark dynamic allocation feature but it could be better tested and better documented / easier to run. Ideally it would even be possible to do this kind of stuff in a trustless fashion, but this would probably require a lot more engineering than trustful but unreliable
Being able to handle unreliable resources would unlock combining many different resources rather than needing to allocate a lot of resources in a single place
for example
follow up of https://github.com/rom1504/img2dataset/issues/20