mitodl / edx2bigquery

Tool to convert & load data from edX platform into BigQuery
GNU General Public License v2.0
29 stars 29 forks source link

Unable to parse daily tracklog #74

Open jleong-openedx opened 6 years ago

jleong-openedx commented 6 years ago

Hi,

I have run into a problem with edx2bigquery where there is a failure to process a tracklog for a day; the error in the BigQuery interface is:

gs://tracklog-2017-11-17.json.gz: Error while reading data, error message: JSON parsing error in row starting at position 12345: No such field: event_struct.duration. (error code: invalid)

The portion in the file that triggers this is:

"event_struct": {"duration": 123.45

This will only occur for files that contain data with that specific JSON schema.

As a result of this, the job fails to complete, and no data for this tracklog file is uploaded to BigQuery. The edx2bigquery library itself fails silently in its execution.

Would someone please be able to look into this? I did some searching and this looks to be a similar situation, with possibly a solution: https://stackoverflow.com/questions/25279116/cannot-insert-new-value-to-bigquery-table-after-updating-with-new-column-using-s

Thank you!

jleong-openedx commented 6 years ago

Hi @ichuang, have you by chance run into this before, and would you please be able to take a look at it?

xcompass commented 6 years ago

@ichuang do you need more info or have any solution? Thanks