terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Systematic gaps in MAC weather station in gantry cache #245

Closed NewcombMaria closed 5 years ago

NewcombMaria commented 5 years ago

@robkooper @jdmaloney and @dlebauer there are gaps in the data from the MAC weather station in the corner of the field that appear to go back to Feb 2017. Most often it looks like data for 1 full day and 1 partial day are missing every week. From what we can tell the data output from the environmental sensors on the weather is continuous. The problem may be in the system that puts the data into daily folder directories.

The missing data was first mention in closed issue #240

robkooper commented 5 years ago

Assigned to @jdmaloney to make sure all data is on the NAS and is transferred correctly to cache server.

jdmaloney commented 5 years ago

Spent a bit of time on this here over lunch, based on file stat output I can see when these files are getting dumped onto the NAS. It confirms our initial suspicion, it appears that the dumping of the data to the NAS takes a break on the weekend. For example a normal file is dumped to the NAS roughly an hour or so after capture: [/share/Public/WeatherStation/WeatherStation] # stat WeatherStation_SecData_2018_07_12_0734.dat File: "WeatherStation_SecData_2018_07_12_0734.dat" Size: 282377 Blocks: 552 IO Block: 4096 regular file Device: fd00h/64768d Inode: 95958508 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 501/ jheun) Gid: ( 100/everyone) Access: 2018-07-12 08:36:30.000000000 Modify: 2018-07-12 08:36:46.000000000 Change: 2018-07-12 09:31:32.000000000

But for the weekend ones, it looks like: [/share/Public/WeatherStation/WeatherStation] # stat WeatherStation_SecData_2018_06_29_2056.dat File: "WeatherStation_SecData_2018_06_29_2056.dat" Size: 257794 Blocks: 504 IO Block: 4096 regular file Device: fd00h/64768d Inode: 95958219 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 501/ jheun) Gid: ( 100/everyone) Access: 2018-06-29 21:58:30.000000000 Modify: 2018-06-29 21:58:41.000000000 Change: 2018-07-02 09:33:38.000000000

This throws off our script as we are only delaying the sync by approx 24-36 hours, I've increased the delay that we're checking so we're now looking ~72 -75 hours back to get information. This means we won't get as prompt weather data up here at NCSA, but we will get all of it :)

If it is possible to figure out why there is the delay in dumping data, then I can adjust the script back otherwise I can leave it where it is now. I'm currently working to figure out what files we've missed historically and get those pushed up here to NCSA.

NewcombMaria commented 5 years ago

From @jtheun by email: (Thanks John!) That's perfect diagnostic feedback info. Weather station files are collected hourly by LoggerNet running on an office PC. They are collected from the weather station and saved on my hard drive. I have Karen's Replicator software set to run everyday at 9:30 am, except for Saturday and Sunday to copy new files from my hard drive to the NAS. Monday morning is when the weekend files are added to the NAS. Karen's is set to run every 24 hours, except for the weekend. I can change that to whatever makes sense. It will be fairly consistent, but network connections, power outages, infrequent computer updates/restarts can interrupt that normal cycle and still throw hammers at JD's script from time to time. I just made Karen's schedule changes for every 8 hours (4:30am, 12:30pm, 8:30pm) and perhaps JD can return the script back to 24 hour cycle or maybe more frequent depending on what he thinks is best. I'll make the change, and let's see if we get better performance. -John

robkooper commented 5 years ago

Would it make sense to have the software copy it both to the NAS as well as to the cache server directly? That way we do no longer need the script from JD and when the data is copied it is immediately picked up by the cache server and thus globus transfer protocol.

jtheun commented 5 years ago

Rob, I think the answer is no, but I'm just not sure. Looking back at the emails bounced around between our campus IT and our on-site man (retired) there are firewall issues. Not in my line of work so I don't understand the lingo. I think a direct way to the cache server was looked for initially and this current solution was the easiest way forward.

jdmaloney commented 5 years ago

I've moved us back to a 2 day check, down from the 4 day check I put us to. That will leave some grace for unexpected outage issues, etc. I'm still working to get time to get files we missed in place on the cache server (I copied all the NAS contents over, just need to diff with what we have and complete the dataset. Once I do that, I'll close this issue.

jdmaloney commented 5 years ago

I've taken care of the diff, all has been sent to NCSA. This should be all good to go.