Closed chelseybeck closed 1 month ago
sql.diff
sql.diff
sql.diff
r+wc, is Merino going to dedupe the
scheduled_corpus_item_id
downstream from here?
no, the last select groups by this column, so duplicates aren't expected :)
sql.diff
r+wc, is Merino going to dedupe the
scheduled_corpus_item_id
downstream from here?no, the last select groups by this column, so duplicates aren't expected :)
Yeah but if you upload this file every 20 minutes, then whatever reads from that file will see dupes (each 20 minute period will be uploaded 72 times).
r+wc, is Merino going to dedupe the
scheduled_corpus_item_id
downstream from here?no, the last select groups by this column, so duplicates aren't expected :)
Yeah but if you upload this file every 20 minutes, then whatever reads from that file will see dupes (each 20 minute period will be uploaded 72 times).
ah, i see...merino must handle that...we'll check once ml is able to test on their end, but i assume it does b/c it re-ranks the recommendations based on engagement
Creating the table for Merino extracts to GCS
Checklist for reviewer:
<username>:<branch>
of the fork as parameter. The parameter will also show up in the logs of themanual-trigger-required-for-fork
CI task together with more detailed instructions.For modifications to schemas in restricted namespaces (see
CODEOWNERS
):┆Issue is synchronized with this Jira Task