The idea is to remove torch datapipes from this repo.
We would essentially replace this with normal python functions instead.
For our ML models, we can then wrap these in torch datasets afterwards
pros and cons
pros
cons
Less work not to do it
Not sure what the benefits are for the extra code
Torch data is good for steaming data
We use xarray, which is good for streaming large data
Its complex and hard to make changes
datapipes not well support by the community
forking is annoying
we have used some infinite loops are bad
debugging and logging is hard
we can use torch dataset which is widely used
Possible Implementation
Start a fresh repo and copy over the functions we need
Refactor this repo,
-- pull out functions from all datapipes
-- rebuild dataflow using new function. rebuilding these files
1. pros
2. pros
Nice to start with a fresh repo
No code duplication
Easier to refactor, don need to worry about breaking tests
Can continue developing
Could get one entire pipeline working first .e.g PVnet
Detailed Description
The idea is to remove torch datapipes from this repo. We would essentially replace this with normal python functions instead. For our ML models, we can then wrap these in torch datasets afterwards
pros and cons
Possible Implementation
What I would like to keep
Other things to do
348