nasa-gcn / gcn.nasa.gov

General Coordinates Network (GCN) web site
https://gcn.nasa.gov
Other
184 stars 44 forks source link

Backfill eventID in Circulars #2647

Open jracusin opened 1 month ago

jracusin commented 1 month ago
          And in addition to this change, I am aware that there needs to be a backfill to get things to work. I need a backfill to populate the eventId in circulars that existed before I added in the eventId on circular creation. I then need a backfill to populate the synonyms with the existing eventIds. I have combined both backfills into one for efficiency sake. This is the backfill script I have currently:
import { tables } from '@architect/functions'
import { ReturnValue } from '@aws-sdk/client-dynamodb'
import type { DynamoDBDocument } from '@aws-sdk/lib-dynamodb'
import { paginateScan, UpdateCommand } from '@aws-sdk/lib-dynamodb'

import {
  type Circular,
  parseEventFromSubject,
} from '~/routes/circulars/circulars.lib'
import { createSynonyms, synonymExists } from '~/routes/synonyms/synonyms.server'

export async function backfill() {
  console.log('Starting backfill...')
  const db = await tables()
  const client = db._doc as unknown as DynamoDBDocument
  const TableName = db.name('circulars')
  const pages = paginateScan(
    { client },
    {
      TableName,
    }
  )
  for await (const page of pages) {
    for (const record of page.Items || []) {
      const circular = record as unknown as Circular
      const validEvent =
        circular.eventId || parseEventFromSubject(circular.subject)
      const promises: Promise<any>[] = []
      if (!circular.eventId && validEvent) {
        const updateParams = {
          TableName,
          Key: {
            circularId: circular.circularId,
          },
          UpdateExpression: 'SET eventId = :eventId',
          ExpressionAttributeValues: {
            ':eventId': validEvent,
          },
          ReturnValues: ReturnValue.UPDATED_NEW,
        }

        promises.push(client.send(new UpdateCommand(updateParams)))
      }
      if (validEvent && !(await synonymExists({ eventId: validEvent }))) promises.push(createSynonyms([validEvent]))
      if(promises.length >= 1){
        try {
          const results = await Promise.all(promises)
          const data = await Promise.all(results)
          data[0].Attributes ? console.log(data[0].Attributes) : console.log(data[0])
          if(data[1]) console.log(data[1])
          console.log("--------")
        } catch (error) {
          console.error(
            `ERROR updating Circular ${circular.circularId}: `,
            error
          )
          throw error
        }
      }
    }
    console.log('... End backfill')
  }
}
await backfill()

Originally posted by @Courey in https://github.com/nasa-gcn/gcn.nasa.gov/issues/2642#issuecomment-2429931625

Courey commented 1 week ago

https://gist.github.com/Courey/f7cd6e6b92b7fdd05daa013a094f40b0

This is the gist for the code to run the backfill on circular eventIds. It is using batchwrite as requested and can be limited to n number of records based on how many we want to update in one run.

lpsinger commented 4 days ago

I'm sorry, @Courey, it looks like gists don't support line comments. Here's an idea: you could open a PR and add this script to the root directory of the repository. Then we can make line comments on the PR. Of course we won't merge the PR to the main branch but that's fine.