CLI controls for oboe-main.jl

karaign commented 3 years ago

Implements #12

FIPS codes are separated by spaces.
Note that the output filename format is slightly different from what is described in the issue: the aggregation level is separated from the node number by an underscore. This behaviour is coded in Oboe.writeMe and I didn't want to risk breaking something by changing it, so please let me know if that function is OK to change.

karaign commented 3 years ago

Thanks for the detailed review, I'll start working on these issues!

karaign commented 3 years ago

Relocate the @add_arg_table definition and compliance tests for arguments (lines 58–76) to a separate file

Should this new file become the new entry point for the CLI? e.g. you would type julia oboe-cli.jl --fips NW --agg tra --name NW where oboe-cli.jl is the new file. Or should oboe-main remain the entry point, with the new file just containing the validation code?

The first option seems better to me as it would neatly separate the UI code from the processing logic, but I wanted to doublecheck this.

yvs314 commented 3 years ago

Good point, go with the first option: oboe-cli.jl as CLI entry point, with oboe-main.jl to house the main processing logic.

karaign commented 3 years ago

(Still working on major issues 2, 4 and most of the minor issues)

I've noticed something weird when comparing my output to the existing files in by-tract: the output files NWste_2-trav.dat and NWcty_75-trav.dat are different from the old files with similar names, but all the values are only off by a tiny fraction on the order of 10^-17 or even smaller, like here:

I am not sure why this could happen or if it matters.

yvs314 commented 3 years ago

(Still working on major issues 2, 4 and most of the minor issues)

Good, thx for the update.

As for the values that are about 1e-17 different, I expect it's just the difference in floating-point operations on different processors, or whatever else may influence them. Incidentally, this ~1e-17 difference is about the machine precision of 64-bit IEEE 754 floating point number.

Model-wise, these differences can be ignored. These matrices are nominated in persons-per-day, and considered an a time scale of about a year.

The “right” way would be to round them to e.g. the nearest 1e-6, which can be done via a keyword arg in the psg function, with the default value 1e-6.

It is OK to just ignore this matter for now.

karaign commented 3 years ago

@yvs314 Please take a look. I think I've hit all the major points, though there are probably some room for improvement. In particular I've had a hard time deciding what file certain aspects of the program (such as checking if a file already exists) should go to since the separation of concerns is currently pretty loose. I think ideally the I/O code should be decoupled from the math, which is something that also ties into #13 since Oboe.jl currently has both mixed together.

yvs314 commented 3 years ago

@karaesque Thanks, I will. Expect a complete review in a day or two.

I agree with decoupling the I/O from the actual processing as a guiding principle. In fact, I'd decouple writing from processOboe and make it work the latter's output after the processing is done, all in oboe-cli.jl. processOboe should return the commute and air travel matrices (might later adopt a sparse form, e.g. list o' pairs or adjacency list), and the nodes/initial values information.

karaign commented 3 years ago

A.1 What is the rationale for adding using Base: func_for_method_checked? I can't see any references to it.

This is just an accident, I think what happened is that I accidentally made my code editor add this import statement when I was trying to type function and never noticed it.

A.3, rdTidyWfsByFIPS, l.550, why wfs3[1]? Doesn't it change the type from e.g. DataFrame to DataFrameRow?

Calling reduce on a collection returns the type that the collection stores, rather the type of the collection itself, so the way it was previously written actually resulted in two different possibilities for the type of wfs4, which caused a crash in certain cases as code that comes after presumes that wfs4 is a DataFrame. On second thought, I'm not sure if keeping the ternary expression was even necessary since reduce on a single-element collection should logically just return the one element.

As to A.2, B.2 and B.3, got it. Should I create a new branch and PR for those fixes?

yvs314 commented 3 years ago

Great, thanks for explaining the matter. I agree that reduce ought to work with one element, but it should be re-checked.

As to A.2, B.2 and B.3, got it. Should I create a new branch and PR for those fixes?

These are too small for a separate pull request, just integrate them when working on the nearest issue.

yvs314 / epi-net-m

CLI controls for oboe-main.jl #15