nasaharvest / dora

Domain-agnostic Outlier Ranking Algorithms (DORA) - SMD cross-divisional use case demonstration of AI/ML
MIT License
12 stars 3 forks source link

Add unit tests #14

Open hannah-rae opened 3 years ago

hannah-rae commented 3 years ago

We can add unit tests for each use case and its corresponding input data type as they become ready:

wkiri commented 3 years ago

I have two sets of reference results for tests now:

Currently the functional test takes about 20 seconds to run on analysis. To get there, I was able to shorten the LRX runtime by adjusting its input parameters. Currently the negative sampling algorithm is taking most of the time. I think the only way to shorten this would be to expose some of the internal parameters like the number of cross-validation folds or the RF parameters that the grid search operates on. I welcome thoughts on this.

To run the tests (from top-level DORA repo directory):

$ pytest

You will see some warnings that are coming from code inside the pvl and planetaryimage libraries. We can probably safely ignore these, as I do not think we are using collections or fromstring().

wkiri commented 3 years ago

@stevenlujpl We discussed that you might add these negative sampling parameters to the config file options (low priority).

bdubayah commented 3 years ago

I added pytest to Github actions. Just an FYI, any commits to master will show an error if there are no tests found in the repo.

hannah-rae commented 3 years ago

I went ahead and merged issue14-unittests into the main branch because it sounds like there won't be enough time in this effort to add tests for the DES case. I left the branch open in case @urebbapr has time to add them later.

@bdubayah we should be able to have pytest run as an action now since it has pytests to run?

bdubayah commented 3 years ago

I just did a run through of all the the tests. It looks like the planetary test failed for the negative sampling case. I'm attaching the log file too; the failed case starts on line 782 failed_log.txt .

bdubayah commented 3 years ago

On my machine this fails on the first planetary case (demud). I'm wondering if this might still be related to https://github.com/nasaharvest/dora/issues/44. In the most recent PR I didn't add anything to sort the images after being loaded in from the directory, so they are still probably being loaded in a different order across machines. If the order of training data affects results, then we might want to sort the images.

wkiri commented 3 years ago

Hmm, I can't run the tests at all due to lack of tensorflow in our dora-venv at JPL:

_____________________ ERROR collecting test/test_earth_time_series.py ______________________
ImportError while importing test module '/home/wkiri/Research/DORA/git/test/test_earth_time_series.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.6/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_earth_time_series.py:14: in <module>
    from dora_exp_pipeline.dora_exp import start
dora_exp_pipeline/dora_exp.py:26: in <module>
    from dora_exp_pipeline.pae_outlier_detection import PAEOutlierDetection
dora_exp_pipeline/pae_outlier_detection.py:21: in <module>
    import tensorflow as tf
E   ModuleNotFoundError: No module named 'tensorflow'

That's weird because I can run the dora_exp.py script, with PAE experiments, just fine. I'll keep looking into it.

wkiri commented 3 years ago

I think the problem was that I was using my (user-installed) pytest, and it doesn't know about tensorflow. I think the solution is to install pytest in the DORA venv. @stevenlujpl could you install this package please?

wkiri commented 3 years ago

@hannah-rae FYI, I'd like to troubleshoot this further, but cannot proceed until pytest is installed in the DORA venv (@stevenlujpl could you do this?), and also I'm no longer allowed to charge to the JPL DORA account so progress may be slow. :( However, I want to capture here that my first guess is that the neg sampling test is failing because I may not have updated the "correct output" after Steven resolved the random seed setting. That's where I would look first.

hannah-rae commented 3 years ago

I was thinking that was probably the issue too. We should be able to look into this on the UMD side and are not blocked by the JPL pytest installation.

stevenlujpl commented 3 years ago

@wkiri Sorry about the delayed response. I've installed pytest-6.2.5 in the shared virtual environment on MLIA machines.