Open S-aiueo32 opened 2 years ago
Cool feature :)
Mostly I agree to this proposal.
It may violate the design philosophy of gokart and luigi to call dump() only once at the end of run().
I think this is not a problem. With current implementation, we can dump multiple times with named targets as follows
class Task(GokartTask):
def requires(self):
return dict(
a=self.make_target("a.pkl"),
b=self.make_target("b.pkl"),
)
def run(self):
self.dump(1, "a.pkl")
self.dump(2, "b.pkl")
Another discussion is whether lmdb should be implemented in SingleFileTarget
or create another LmdbTarget
:)
Thank you for your reply.
For now, I have implemented a way to use SingleFileTarget
to support lmdb, and passed the test locally.
https://github.com/S-aiueo32/gokart/tree/4d65e5f359a44d4113945dd82fc2c3171fa973da
In LocalTargetTest
, there was an error around the locking of lmdb, so I made sure not to lock it when opening lmdb.environment
(probably not a problem since it is locked in other parts of gokart).
Another discussion is whether lmdb should be implemented in SingleFileTarget or create another LmdbTarget :)
I see. It is indeed awkward to have a conditional branch depending on the processor type in SingleFileTarget
. It would be better to create a new LmdbTarget
that inherits from SingleFileTarget
, and rewrite load
and dump
.
If we create a new LmdbTarget
, do you have an opinion on whether it is preferable to branch in make_target
or create a new function like make_lmdb_target
?
@S-aiueo32 Thank you for proposing great feature!
It would be better to create a new LmdbTarget that inherits from SingleFileTarget
I think LmdbTarget should inherite TargetOnKart
just like SingleFileTarget
or ModelTarget
, not to make overcomplicating inheritance dependencies.
do you have an opinion on whether it is preferable to branch in make_target or create a new function like make_lmdb_target
I think creating new function make_lmdb_target
is better, just like existing make_model_target
.
@S-aiueo32 Hi, This is just a friendly reminder (not urgent). What's the status of this issue?
I would like to add
.lmdb
to the file formats supported by TaskOnKart.make_target()..lmdb
is the format used by several popular datasets, and is actually suitable for handling large datasets (especially images).For those who are not familiar with
.lmdb
, here is a brief explanation of its usage..lmdb
is a Key-Value store whose values can be retrieved via thelmdb.Environment
object:The following changes are necessary to support
.lmdb
:Add
LmdbFileProcessor
:...
def make_file_processor(file_path: str, store_index_in_feather: bool) -> FileProcessor: extension2processor = { ... '.lmdb': LmdbFileProcessor(file_path) }
Write tasks!
class LoadImages(GokartTask): def requires(self): return DownloadImages()