instructlab / instructlab-bot

GitHub bot to assist with the taxonomy contribution workflow
Apache License 2.0
14 stars 16 forks source link

implementing storage cleaningsystem with options #311

Open Gregory-Pereira opened 5 months ago

Gregory-Pereira commented 5 months ago

Addresses: #305 /cc @vishnoianil @nerdalert Please take a look.

I am struggling still with the event based strategy. I built a channel that will watch for disk-pressure, but I didn't know how to pass the working directory of the job that pushed the usage past the limit and triggered the channel event.

The .github/workflows/test-at.yaml workflow file will be dropped once I can verify that the at binary ships with Ubuntu 22.04.

Also looking for thoughts on the enum I tried to setup with the CleanupStrategyType, didn't see / know of any comparable Enum examples in our codebase.

Gregory-Pereira commented 5 months ago

Confirmed at does not ship with ubuntu 22.04 out of the box, will need to find another way. Considering cron but I dont want these jobs kicking around for specific filepaths running every 3 days or so, could result in hundreds of cronjobs. Another potential solution is a seperate queue for cleanup jobs that is time based ...

vishnoianil commented 5 months ago

@Gregory-Pereira

I think at this point of time, we can just implement a simple Lazy policy and see how well it works. We can implement rest of the policies in the future if needed.

Gregory-Pereira commented 5 months ago

@vishnoianil I am confused if this cleanup will take place at a static directory? I was under the assumption that both of these cleaning operations would be after pre-check / generate, in which case wouldn't it be a path to the data specific to that call? It seems odd to use a lazy strategy with regard to data for a specific run. Please let me know if I am misunderstanding something (almost certain I am).