ossf / scorecard

OpenSSF Scorecard - Security health metrics for Open Source
https://scorecard.dev
Apache License 2.0
4.27k stars 463 forks source link

Cron stopped importing into BQ #295

Closed dlorenc closed 3 years ago

dlorenc commented 3 years ago

The cron stopped importing in biquery: image

There's a way to debut this but I forget how right now, you need to use gcloud with the job ids to search for the errors. Opening this bug to track that so I don't forget.

naveensrinivasan commented 3 years ago

Wow! Is the Bug query open for everyone? We should fix this and add it to the readme.

dlorenc commented 3 years ago

Wow! Is the Bug query open for everyone? We should fix this and add it to the readme.

Yup! It should be publicly readable. It's just an automated import from the GCS bucket.

naveensrinivasan commented 3 years ago

@dlorenc Need your help with this. I don't have permission. I am trying to update the Big Query with the latest json structure.

image

inferno-chromium commented 3 years ago

@naveensrinivasan - made you bigquery admin, go ahead!

naveensrinivasan commented 3 years ago

@naveensrinivasan - made you bigquery admin, go ahead!

Thanks.

oliverchang commented 3 years ago

This probably stopped working due to https://github.com/ossf/scorecard/commit/0eaa4ff3d0dd36cd1d200662e5fc9803cc2fd7be. The BQ import expects newline delimited JSON objects, rather than a well formed JSON file.

dlorenc commented 3 years ago

This is still failing. Anything I can do to help fix it?

naveensrinivasan commented 3 years ago

AFAIK the json dump has to be changed.

azeemshaikh38 commented 3 years ago

Dan, see https://github.com/ossf/scorecard/issues/336. I'm working on it right now, but if the failures are blocking work in some way, I can submit a hacky quick-fix solution to unblock this. Let me know.

oliverchang commented 3 years ago

Unfortunately this is still failing because the Date format in the latest.json is wrong:

Invalid date: '21046-04-21' Field: Date; Value: 21046-04-21

I have a fix in #353

oliverchang commented 3 years ago

The last two runs have been green.

naveensrinivasan commented 3 years ago

The cron still isn't importing correctly for example the data for this isn't populated in this tableSELECT * FROMopenssf.scorecardcron.scorecard_latestLIMIT 1000

The old table isn't populated either SELECT distinct(date) FROMopenssf.scorecardcron.scorecardLIMIT 100 the last import in old table is 04-07

azeemshaikh38 commented 3 years ago

So the results are being imported - if you "ORDER BY Date DESC", you'll see the rows. However, turns out that since the output has "CheckResults" instead of "Checks", the checks in BQ are empty.

naveensrinivasan commented 3 years ago

So the results are being imported - if you "ORDER BY Date DESC", you'll see the rows. However, turns out that since the output has "CheckResults" instead of "Checks", the checks in BQ are empty.

OK, Thanks! So we will need to address the checks/checkresults

azeemshaikh38 commented 3 years ago

Yes. Working on a PR for that, will send out shortly.

oliverchang commented 3 years ago

Nice catch!

FYI I manually fixed up the latest.json from the last cron job and recreated the BQ table (and added Date partitionining).

azeemshaikh38 commented 3 years ago

Awesome, thanks Oliver!