ocean-transport / floater

For working with lagrangian float data
http://floater.readthedocs.io
15 stars 17 forks source link

csv_to_netcdf #30

Closed geosciz closed 7 years ago

geosciz commented 7 years ago

Hi @rabernat and @nathanieltarshish,

Please take a look at this pull request for converting MITgcm output CSV files to NetCDF files. Feel free to give me feedbacks on it.

Best regards,

@geosciz

codecov-io commented 7 years ago

Codecov Report

Merging #30 into master will increase coverage by 3.38%. The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
+ Coverage   56.05%   59.44%   +3.38%     
==========================================
  Files           3        3              
  Lines         289      323      +34     
  Branches       46       52       +6     
==========================================
+ Hits          162      192      +30     
- Misses        111      115       +4     
  Partials       16       16
Impacted Files Coverage Δ
floater/utils.py 33.11% <88.23%> (+16.01%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 67c9697...698e10b. Read the comment docs.

rabernat commented 7 years ago

I repeat my overall comment here:

This is a good start. But there is a ways to go before this is ready to merge. You should

For the tests, you will probably want to add a new gzipped file with some sample .csv data. (It should be a very small subset, just a few KB of data will do.)

rabernat commented 7 years ago

@geosciz: do you know how to run the test suite?

rabernat commented 7 years ago

@geosciz do you understand how to run the test suite locally?

py.test -v floater
geosciz commented 7 years ago

Hi @rabernat,

I don't quite understand how to run the test suite locally. Previously, I just push new commits, and run it on Travis CI to test the code.

nathanieltarshish commented 7 years ago

Is there any easy way to migrate this csv_to_netcdf branch from @geosciz's fork to the main @rabernat fork (without having to restart this review from scratch)? Having it on @rabernat's fork would make it easy for me to test it and make commits.

rabernat commented 7 years ago

I don't quite understand how to run the test suite locally. Previously, I just push new commits, and run it on Travis CI to test the code.

Did you try the command I typed above? Run py.test -v floater from the command line in the floater top directory.

rabernat commented 7 years ago

Is there any easy way to migrate this csv_to_netcdf branch from @geosciz's fork to the main @rabernat fork (without having to restart this review from scratch)? Having it on @rabernat's fork would make it easy for me to test it and make commits.

That's not necessary. You can check out any pull request locally following these instructions: https://help.github.com/articles/checking-out-pull-requests-locally/

geosciz commented 7 years ago

Hi @rabernat, I just tried it. A test suite is running.

rabernat commented 7 years ago

One of the tests is really slow because it has to write a huge file.

I recommend selecting just your test with the -k flag while you are developing it. Please use the pytest docs to understand how to do this.

geosciz commented 7 years ago

@rabernat Sure. Thanks for your help!

rabernat commented 7 years ago

FYI, @geosciz, the most recent bug is exactly why we need tests for the code. Tests would catch that error.

Please move forward with adding tests to this PR. Otherwise we will continue to waste time on bugs.

rabernat commented 7 years ago

And the tests passed! :)

rabernat commented 7 years ago

@geosciz, can you explain what happens when floats disappear? We know that floats die over the course of the simulation, so that by day 90, not all of the nparts are there. This is fine for each individual file. But what happens when we re-open those files using open_mfsdataset? Does it automatically fill in the missing nparts with NaN?

An alternative could be to "reindex" the datasets with the full npart range before calling to_netcdf.

geosciz commented 7 years ago

Hi @rabernat, when I look at the npart on day 90, it seems that the length of npart is still 37136797. I use np.isnan(npart) to check the array, returning no NaN in it.

rabernat commented 7 years ago

Ok, I guess this is ready to go! Great work @geosciz!