pangeo-forge / pangeo-forge-cloud-federation

Infrastructure for running pangeo-forge across multiple bakeries
Apache License 2.0
3 stars 6 forks source link

Save Jobs History on Flink #6

Closed ranchodeluxe closed 8 months ago

ranchodeluxe commented 11 months ago

Mount EFS to Job Managers so they can archive jobs for historical status lookups

Addresses: https://github.com/pangeo-forge/pangeo-forge-runner/issues/122

Related PR: https://github.com/pangeo-forge/pangeo-forge-runner/pull/131

ranchodeluxe commented 11 months ago

So, EFS is NFS. And NFS is one of those 'you have a problem, you think you will use NFS, and now you have two problems' situations. It plays poorly with a lot of data formats that use any kinda file locking (see https://www.sqlite.org/howtocorrupt.html#_filesystems_with_broken_or_missing_lock_implementations), and the file corruption only shows up in the worst possible times. So I think the primary, and perhaps the only, time to use NFS (and hence EFS) is when providing home directories.

Given we already have the EBS provisioner setup and use it for prometheus, can we not use EBS here too? It does mean that only one pod can write to an EBS volume at a time, but relying on NFS for multiple-replica high availability eventually only leads to tears, pain, blood, stale file handle crashes and death.

Left some inline comments about the kubernetes provider.

Thanks for giving me the deep deets on why EFS/NFS is bad. I was going to use EBS but then I realized something when playing with multiple job managers that made me switch back to EFS:

  1. There's no reason we need to start the historyserver as the docs recommend. It seems the job manager REST API serves the history API (that's how the job manager UI basically works)

  2. More importantly even if a job manager DID NOT RUN a job it can still find the archived job in the EFS mount and return information about it. This is important b/c that means any of the existing job manager REST APIs can tell us about all history even if the job manager that specially ran a job is killed (hence needing multiple pods to have the EFS mount). In the future we are probably going to need to create some type of kind: Job || CronJob reaper that cleans up kind: FlinkDeployment on a regular basis. If we do that we can't expect job-manager pods to stick around anyway

Does any of that assuage your fears and persuade you one way or the other @yuvipanda?

ranchodeluxe commented 11 months ago

doh, so poor: https://github.com/hashicorp/terraform-provider-kubernetes/issues/1775#issuecomment-1193859982

maybe I just write a helm config since that works

yuvipanda commented 11 months ago

maybe I just write a helm config since that works

YESSS, I always prefer this over raw manifests :)

yuvipanda commented 11 months ago

Thanks for engaging with me on the EFS issues :) My goal here is not to say 'no EFS ever', but just to make sure we are only using it after we have completely determined that EBS is not an option.

So if I understand this correctly, the reason for EFS over EBS are:

  1. Multiple pods may be writing to this filesystem. a. QUESTION: Will these be concurrently writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job? b. QUESTION: Will these multiple writers be writing to the same files, or different files? And concurrently, or serially?
  2. Will this reaper process require direct read and write access to the files dropped there by the flink servers? I don't think I fully understand the relationship between the reaper and EFS.

I think answers to these questions will help me a lot :)

ranchodeluxe commented 11 months ago
1. Multiple pods may be writing to this filesystem.
   a. QUESTION: Will these be _concurrently_ writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job?
   b. QUESTION: Will these multiple writers be writing to the _same_ files, or different files? And concurrently, or serially?
2. Will this reaper process require direct read and write access to the files dropped there by the flink servers? I don't think I fully understand the relationship between the reaper and EFS. 

No, the reaper process doesn't need to access the EFS mount. It's only checking kind: FlinkDeployment and their ages and then kubectl delete <kind:flinkdeployment> past some age expiry

ranchodeluxe commented 11 months ago

These clowns removed the 1.5.0 operator: https://downloads.apache.org/flink/flink-kubernetes-operator-1.5.0

ranchodeluxe commented 11 months ago

These clowns removed the 1.5.0 operator: https://downloads.apache.org/flink/flink-kubernetes-operator-1.5.0

Got confirmation from one of the devs that only the latest two operator versions are supported and one was just released. He's not sure if this documentation applies to the operators as well but it pretty much aligns:

https://flink.apache.org/downloads/#update-policy-for-old-releases

specific to the operator: https://cwiki.apache.org/confluence/display/FLINK/Release+Schedule+and+Planning

ranchodeluxe commented 11 months ago

Thanks for working with me on this, @ranchodeluxe. I think using EFS is alright here! I've left some other minor comments, but overall lgtm

Sorry @yuvipanda I thought I muted this by turning it back into a draft so it wouldn't ping you. I'll do that now (it still needs a bit of work) and I'll incorporate your feedback before requesting another review. Here are some answers to some previous questions:

  1. Multiple pods may be writing to this filesystem. a. QUESTION: Will these be concurrently writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job?

The JobID(s) returned are statistically unique. And the writers of history to the NFS are a single process/thread

ranchodeluxe commented 10 months ago

@yuvipanda gentle nudge with some 🧁 for dessert 😄

ranchodeluxe commented 9 months ago

alrighty then @yuvipanda, back at this with recent changes so @thodson-usgs can use EFS

thodson-usgs commented 8 months ago

Looks good to me