m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
318 stars 57 forks source link

Question: how to save folder based data? #238

Closed miyamonz closed 3 years ago

miyamonz commented 3 years ago

Models in huggingface/transformers have save_pretrianed and from_pretrained methods. these methods save and load config.json and pytorch_model.bin in specified dir path.

model.save_pretrained('./dirname') # ./dirname directory created.
SomeModel.from_pretrained('./direname')

I want to dump and load such a directory as target. Is there a way to do this?

Hi-king commented 3 years ago

As huggingface/transformers intending only to save into local dir ( https://github.com/huggingface/transformers/blob/39084ca663d2c8d49fd22f0eae00e98d5d44bac3/src/transformers/feature_extraction_utils.py#L285 ), it might be hard to use this with TaskOnKart.dump.

The only way I came up with is to save it into local dir and archive it.

def run(self)
   with tempfile.TemporaryDirectory() as tmpdirname:
      model.save_pretrained(tmpdirname)
      shutil.make_archive('temp.zip', 'zip', tmpdirname)
   self.dump(open('temp.zip').read())   

And reverse way to load it`.

hirosassa commented 3 years ago

@miyamonz Thank you for writing in! Cloud we close this issue?