Open ZongyangLi opened 6 years ago
@ZongyangLi there are a few options:
The fastest way to process these data would be to use globus to transfer data to comet and run the pipeline as a batch job there.
But this also brings up the question - how will these outputs be different from what is in the beta release? Is there anything about the data in the beta release that makes it invalid? If the code is ready it may also be an option to deploy the new version and reprocess all of the data.
Ideally - but perhaps not necessarily - we should use the publicly available data in our publications.
@dlebauer I didn't realize that these outputs would be added into our database. To my understanding, the data is an initial data looking for a further investigation. So what I expected is a file system like Roger, we can easily access all the data from there, and save outputs on the file system.
If we would like to deploy the new version, it should be save as a different 'method' compared with 'canopy_height'. That's an option. My question is:
I can paste the details of @rmgarnett 's proposal here if that's necessary, they were already in an email CC'ed to a lot of people.
Indeed this is an initial investigation into a methods paper led by @JeffWhiteAZ and @NewcombMaria. It's not yet clear what the conclusion will be.
We can include the data into the database as well. One big question at the moment that is not yet resolved is ground subtraction, and how it should be handled.
@rmgarnett should we figure out the ground subtraction before doing a full season of reprocessing?
@ZongyangLi To answer 1) data should be available now on workbench. Can you work out the method there? For processing the full season the data should be available on Globus by Friday for transfer to Comet. @craig-willis can answer 2 and 3.
When we reprocess the data, we would first purge the old canopy height and replace it with the new data. We could either create a new method or update the old one.
We have a few scenarios in mind: 1) use workbench for development, 2) use nebula for real-time processing of the data stream as well as re-processing if it can handled the load and 3) move data to a cluster for reprocessing. Now that ROGER has reached end of life we do not have a cluster mounted to our file system.
@dlebauer If we're processing the season anyway there is no additional cost to computing the requested ground statistics.
Late update to this ticket. As discussed during the team meeting last week, we're still working on how to support this type of processing with the loss of ROGER. In the short term, the most reliable approach would be to transfer the desired data to an XSEDE resource via Globus. David has a startup allocation on XSEDE Comet that can be used. You'll just need to create an XSEDE account (https://portal.xsede.org/) and send David the username to add to the allocation. Comet uses the SLURM scheduler (as opposed to ROGER's PBS), but they're quire similar.
David had help me setup a startup allocation, I am now transferring data to the endpoint of 'SDSC Data Oasis' on Globus, I found a directory named by my username there, and I was able to create directories under the 'temp_project'. After the transferring, I will start running a job through Comet.
@craig-willis Do you have any idea how can we install or find all the python dependencies on Comet such as PILLOW, OPENCV, lmfit, utm and so on.
@ZongyangLi two options:
pip install virtualenv --env
to install modules locally. Does this get you what you need?pip install virtualenv
@ZongyangLi could you also add Craig Willis as a user on your Comet allocation?
Sure @craig-willis Could you send me your username on XSEDE?
@dlebauer @craig-willis The transfer from globus terraref to comet scratch stopped when it reach 1 TB. It's the same as my Data Oasis Limits.
Let me try it again, I start two transfer and they both stopped after transferring about 500 GB data. Maybe it's one time 500GB limits.
Perhaps try opening a ticket with Comet support? On Fri, Jan 26, 2018 at 9:21 AM ZongyangLi notifications@github.com wrote:
Let me try it again, I start two transfer and they both stopped after transferring about 500 GB data. Maybe it's one time 500GB limits.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/385#issuecomment-360811907, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX51qyhr54iBKukBCJv-gjsLyxJ6_Cks5tOexUgaJpZM4RZ5M- .
@JeffWhiteAZ and @rmgarnett are leading an initial TERRA REF height estimation paper. I am trying to provide reference data. They are:
@rmgarnett will use these data for a better estimation on plant height.
Since Roger is close for us, I need another approach to do such a 'one-off analysis'. @dlebauer @craig-willis