onnela-lab / forest

Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data
https://forest.beiwe.org
BSD 3-Clause "New" or "Revised" License
28 stars 16 forks source link

Frequency argument in forest.jasmine.traj2stats.gps_stats_main not working as in documentation #100

Closed clementzach closed 2 years ago

clementzach commented 2 years ago

Commit hash (the first line of output from git show): b04435222c4109e3a06db997e4feeac5c542fe02

Describe the bug Entering strings, such as "both" or "hourly" to the frequency argument does not change the output of jasmine. This does not match what is in the documentation, where strings are passed to the function.

To reproduce

import sys
sys.path.insert(0, '/path/to/keyring')

import mano
import mano.sync as msync
import keyring_studies
kr = mano.keyring(None)
download_folder = "min_repro_gps"
study_id = "tMQjJjZgdNdrYThLhK3TKnVY"
usr = "gq36edfi"
data_streams = ["gps"]

zf = msync.download(kr, study_id, usr, data_streams, time_start = "2022-02-01", time_end = "2022-04-01")
zf.extractall(download_folder)

import forest.jasmine.traj2stats
output_dir = "gps_output"
tz_str = "America/New_York"
option = "hourly"
save_traj = True 

forest.jasmine.traj2stats.gps_stats_main(download_folder, output_dir, tz_str, option, save_traj)

Observed behavior No errors are returned, but only daily output is returned, whether option is set to "hourly" or "both"

Expected behavior I expect the function to write csv files that are not daily summary statistics

Additional context I am able to get jasmine to work by using:

forest.jasmine.traj2stats.gps_stats_main(
    download_folder, output_dir, tz_str, frequency = forest.jasmine.traj2stats.Frequency.BOTH, save_traj = False
)

I believe that this could be fixed by changing conditional statements in forest.jasmine.traj2stats.gps_stats_main such as:

    if frequency == Frequency.HOURLY:

to access the value as in:

    if frequency == Frequency.HOURLY.value:

However, I'm hesitant to do a pull request because I do not know what the plans were for introducing the Frequency class.

@hackdna if my solution is acceptable, let me know and I can implement it.

clementzach commented 2 years ago

(Also tagging @martakarass the current maintainer, and @GeorgeEfstathiadis, who seems to have made the change). If it seems like my strategy for fixing this is feasible, I can go ahead and implement it.

hackdna commented 2 years ago

You can simply do:

from forest.jasmine.traj2stats import gps_stats_main, Frequency
gps_stats_main(..., Frequency.HOURLY, ...)

The docs should be updated to reflect this.

Alternatively, gps_stats_main() can be changed to accept a string (and then convert it to an appropriate Frequency object for internal use) if that would provide a more user friendly API.

clementzach commented 2 years ago

@hackdna sorry I'm just replying to this now, but yes that does seem like a workable solution. I'm happy to update the documentation—should I do it now or when we merge develop with main?

And, I'm just curious: is there a reason why using Frequency class is preferable to taking a string argument?

hackdna commented 2 years ago

The top-level README.md on the develop branch already corresponds to the updated code. The main branch docs have the old version of the docs that correspond to the old version of the code. The wiki should be updated to match the code in the develop branch. Being able to have a separate version of the docs for each branch is another reason to move the docs from the GH wiki to Sphinx/Read the Docs.

The reason to use an enum is for maintainability and robustness. If the frequency option ever changes (e.g., from lower to upper case) there is only one place to update it. The error checking is also easier since every function that requires a frequency is protected from bugs caused by accidental misspellings.

Having said that it might make sense to update just gps_stats_main() to use a string for a better user experience. Also, it might be useful to rename it to just gps_stats().