Closed connected-bsamadi closed 5 years ago
The command above should be submitting a Dataflow job which does the processing. So you should check that the Dataflow job actually completed successfully.
Can you check your dataflow job in the UI? Does it show a WriteToBigQuery step if you expand it? Did your Dataflow job actually complete?
Here's what I see in the GCP Cloud Console
I just checked it. I see three failed jobs. One of them lasted for a day and 13 hours and I was charged $162 for it!
@connected-bsamadi sorry to hear that. There was a bug in the Dataflow job that was fixed by #302. This prevented the job from running efficiently. It show now run in elapsed time of about 20 minutes and use ~110 CPU hours.
Thanks @jlewi. We are talking to Billing Support of Google to see if they can help us. It was a bit weird that every step of the job had failed yet it continued for 91 CPU days.
The code_search.dataflow.cli.preprocess_github_dataset command finishes successfully but it doesn't create a dataset on BigQuery. This is the output of the command: