leap-stc / data-management

Collection of code to manually populate the persistent cloud bucket with data
https://catalog.leap.columbia.edu/
Apache License 2.0
0 stars 5 forks source link

Trying to debug daily metaflux recipe #23

Closed jbusecke closed 3 months ago

jbusecke commented 1 year ago

The daily metaflux recipe has been running for 3!!! days (cc @cisaacstern). Testing if this is related to e.g. only some of the files here.

jbusecke commented 1 year ago

Up till 2012 went really fast

jbusecke commented 1 year ago

2018-2022 also went through in a reasonable time...maybe there is something wrong with one of the files from 2012-2018?

jbusecke commented 1 year ago

Or it is an issue when there are too many files in the recipe. Testing 2012-2018 now to figure that out.

jbusecke commented 1 year ago

I am leaning towards the fact that too many files somehow affect the outcome of this. I believe I have tested all files in different chunks at this point, and in smaller batches, they all went through.

jbusecke commented 1 year ago

2012-end worked well, going back on the start in increments now.

jbusecke commented 1 year ago

2006-end is taking forever again...

cisaacstern commented 1 year ago

@jbusecke anything I can do to help here?

jbusecke commented 3 months ago

I am not quite sure what the pickoff point here was, but all work on this dataset has been moved to https://github.com/leap-stc/metaflux_feedstock