Open Hafplo opened 4 years ago
Hi and thanks for this issue. However, I'm not sure if I understand the problem exactly. What's wrong with your current "workaround"? Note that you are free to save anything you like inside your custom dataset. For example, we already do this for the GEDDataset
, see here.
It works fine. But I believe calling "super().init()" after defining some properties is considered bad practice.
It also goes down to having the "process" method called implicitly by "init". So the 'init' method is the same for all custom datasets, but the 'process' method is different for each one (and can be very complicated and intensive). Maybe separating them will make it more clear?
Mh, I personally think that the implicit downloading and processing is very convenient. If you want to avoid "bad practice", you can also set your attributes in the process
method and save them to disk.
š Feature
I would like to have to ability to add custom attributes to custom dataset. My use case is: paths and tables with the original files and GT. Other use cases: flags, metadata, sources
Motivation
After processing the Dataset we are left with Data / Batch object with minimal context (numeric tensors) without knowledge about how they were created, when, by which process or from what source. It would be beneficial to add this info to the Dataset object in order to 'save' all parameters and inputs used in the "process" method (this is similar to saving hyper-parameters for each model training experiment). Since downloading and processing the Dataset takes time, and since the methods and sources may change (especially when working on non-benchmark datasets OR experimenting with new types of graphs), it is essential to 'keep track' on our input dataset during research and experimentation
Additional context
My current workaround:
Note: I want to use those properties while calling the "process" method, so they are defined before super init