world-federation-of-advertisers / cross-media-measurement

Apache License 2.0
35 stars 11 forks source link

Add support to run Panel exchange task on EMR #1602

Open sandip80 opened 5 months ago

sandip80 commented 5 months ago

Changes

Testing

wfa-reviewable commented 5 months ago

This change is Reviewable

sandip80 commented 4 months ago

MODULE.bazel line 129 at r2 (raw file):

bazel_dep(
    name = "common-jvm",
    version = "0.82.2",

Version upgrade required after common-jvm PR is merged

sandip80 commented 4 months ago

MODULE.bazel line 136 at r2 (raw file):

    commit = "1cdb3696bce54b0906396657029cdbea4b76a793",
    remote = "https://github.com/world-federation-of-advertisers/common-jvm",
)

To be removed

Code quote:

git_override(
    module_name = "common-jvm",
    commit = "1cdb3696bce54b0906396657029cdbea4b76a793",
    remote = "https://github.com/world-federation-of-advertisers/common-jvm",
)
sandip80 commented 1 month ago

imports/java/org/bouncycastle/bcpkix/BUILD.bazel line 1 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
I suggest restricting visibility

Done.

sandip80 commented 1 month ago

src/main/k8s/panelmatch/dev/example_edp_daemon_aws.cue line 21 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
The storage bucket should come from a Make var, since they must be globally unique. K8s configs for any environment must be generatable from Bazel without any code changes.

Removed the _defaultAwsConfig and moved the dynamic variables to Make var. Storage bucket name isn't dynamic so we should be fine to have it here (same as Google cloud).

sandip80 commented 1 month ago

src/main/k8s/panelmatch/dev/example_edp_daemon_aws.cue line 30 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
Where does this sequence relative to @Marco-Premier's changes to refactor the PX Terraform? (Terraform should not be used to manage K8s object configs).

Removed the default config in favor of Make var

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemonFromFlags.kt line 100 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
I think this can stay private. see my other comment.

Done.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemonFromFlags.kt line 139 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
ditto on staying private.

Done.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/deploy/ProductionExchangeTaskMapper.kt line 185 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
is there a use case where this would be set but emrBeamTaskExeuctor would be false? If not, you could just drop this variable and rely on if emrService is set or not.

Removed the extra boolean

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/deploy/example/aws/AwsExampleDaemon.kt line 95 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
I don't think you need to override the exchangeTaskMapper. Instead, what I'd suggest is adding ``` protected abstract emrService... ``` to ExchangeWorkflowDaemonFromFlags.kt and passing it in there. Similar to how makePipelineOptions is declared and consumed.

Removed the override from here and added makeRemoteTaskOrchestrator similar to makePipelineOptions

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/EmrExchangeTask.kt line 27 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
see my other comment. I suggest just passing in the stepCase here instead of the exchangeStepName (which is not really an exchangeStepName but a jobRunId)

Changed the var name to orchestratorName

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/EmrExchangeTask.kt line 36 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
this is a jobRunId not an exchangeStepName, right? Can you just calculate it within this task ``` val jobRunId = "${stepCase.name.lowercase()}-$exchangeId-${exchangeDate.format(DateTimeFormatter.ISO_LOCAL_DATE)}" ```

Moved into the EMR orchestrator task

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrExchangeTaskService.kt line 29 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
see my other commend about kdoc and making this not open

Updated

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrExchangeTaskService.kt line 39 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
this could just be a lazy variable ``` private val appId: String by lazy { val exchangeTaskAppIdBlob = runBlocking { storageClient.getBlob(exchangeTaskAppIdPath) } if (exchangeTaskAppIdBlob != null) { runBlocking { exchangeTaskAppIdBlob.toStringUtf8() } } else { val applicationId = emrServerlessClientService.createApplication(EMR_EXCHANGE_TASK_APP_NAME) runBlocking { storageClient.writeBlob(exchangeTaskAppIdPath, applicationId.toByteStringUtf8()) } applicationId } } ```

Done

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrExchangeTaskService.kt line 62 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
nit: If you're going to use RuntimeException, just use Exception instead. Kotlin doesn't use caught exceptions, so there's not a useful distinction between Exception and RuntimeException.

Done

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 113 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
do these always need to be set? what happens if these last two aren't set?

This was added since default java runtime on older EMR image was set to 1.8 but I see that this has changed to 17 as per this doc: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/configuring-java8.html . Will remove these 2 settings.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/deploy/ProductionExchangeTaskMapper.kt line 193 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
it.stepCase == step.StepCase

Done.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/common/certificates/BUILD.bazel line 12 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
I think we should have a separate target for GenerateCsr and have a more narrow exposure. Its really only the AWS CA that needs to consume it.

Done.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 43 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
+1 to interface rather than open. A class should only be open if it's intended to be subclassed (and not just for mocking).

Created a separate interface for unit test

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 43 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
"Service" for an application that depends on gRPC generally means a gRPC service impl. This looks like it's just a client (which happens to be implemented as a wrapper around `EmrServerlessClient`).

Done.

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 61 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
I also thought this. I'd prefer separate start and stop. The example AWS code I saw I all had separate start and stops

Created separate start and stop functions

sandip80 commented 1 month ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 154 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
This should live in the companion object instead.

Done.

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 79 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
It'd be better class design IMO to separate out the polling wait. e.g. have a non-suspending `start` method annotated with `@Blocking` (assuming the client calls are blocking), and a separate suspending `waitForApplicationState(applicationId: String, state: ApplicationState, pollingInterval: Duration = DEFAULT_POLLING_INTERVAL)`. Also rather than a number of attempts, the caller can just wrap in `withTimeout` if desired. Obviously if you really want, you could still wrap those two calls into `startAndWait` like you have now.

Changed logic to use timeout based polling instead of max count.

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 156 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
CONST_CASE and move to companion object

Done.

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 65 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
Are these client methods blocking? If so, they should be wrapped in `withContext` when called from coroutine code. Pass in a CoroutineContext to the constructor annotated with `@BlockingExecutor` for this purpose.

Added withContext(Dispatchers.IO

sandip80 commented 4 weeks ago

MODULE.bazel line 129 at r2 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
I went ahead and cut a v0.90.0 release for you: https://github.com/world-federation-of-advertisers/common-jvm/releases/tag/v0.90.0 You just need a PR to https://github.com/world-federation-of-advertisers/bazel-registry so you can use it here.

Updated

sandip80 commented 4 weeks ago

MODULE.bazel line 231 at r4 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
nit: Split out an AWS_SDK_VERSION constant here like we do in common-jvm, with a comment indicating that it matches the common-jvm version. #1790 will simplify this, but I don't know which PR will land first

Added AWS_SDK_VERSION

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/remote/aws/EmrServerlessClientImpl.kt line 122 at r5 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
can you add a TODO to move this to a config?

Let me move them to AWS Daemon flags

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrServerlessClientService.kt line 108 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
nit: these should be configurable via command-line options

Will do

sandip80 commented 4 weeks ago
Previously, SanjayVas (Sanjay Vasandani) wrote…
Can you rebase this on head? There's relevant base changes that make this harder to review.

Done.

sandip80 commented 4 weeks ago

src/main/kotlin/org/wfanet/panelmatch/client/exchangetasks/emr/EmrExchangeTaskService.kt line 29 at r3 (raw file):

Previously, sandip80 (Sandip Samantaray) wrote…
Added 1 to the interface

Oh wait, wrong class, will add one here.