kaizen-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
112 stars 76 forks source link

Add smoke test for lib_tasks_run_model_experiment_notebooks #768

Closed sonaalKant closed 6 months ago

sonaalKant commented 7 months ago

Ref : https://github.com/cryptokaizen/cmamp/issues/7782

The goal is to mock running the actual notebooks and publishing the results this test should just confirm that the interfaces match, e.g. prepare some example config file and make sure the code can run successfully up until the mocked part.

We want to add unit test for invoke run_notebooks which is : https://github.com/sorrentum/sorrentum/blob/f57939221815b3c5ad183f885996181c8394ec93/dev_scripts/lib_tasks_run_model_experiment_notebooks.py#L174

FYI @jsmerix

Jd8111997 commented 6 months ago

@sonaalKant the link in the reference is broken. Could you please fix that ?

gpsaggese commented 6 months ago

@sonaalKant the link is on cmamp. Are we onboarding @Jd8111997 on the other side? If not, we need to copy/paste the issue to Sorrentum

sonaalKant commented 6 months ago

@Jd8111997 Reference link is for us to track the progress from cmamp. You can ignore that.

gpsaggese commented 6 months ago

The specs are clear but I'm ofc biased since I have background.

@Jd8111997 feel free to post a plan or do a draft PR (e.g., only a sketch/skeleton, we won't expect any convention to be followed, you just have to mark it as draft and ask for a architectural review). The worst case scenario is misinterpreting the specs and come back with something correct but different than was it requested.

Jd8111997 commented 6 months ago

@sonaalKant, do we have any sample config files? It would be a great help to me in better understanding the run_notebook function.

gpsaggese commented 6 months ago

@samarth9008 this is going to be difficult for @Jd8111997 unless he has access to cmamp to test with real data. WDYT? An alternative is to switch to one of the issues filed by @DanilYachmenev

samarth9008 commented 6 months ago

I had the same thought but was not sure if Sonaal has a different perspective.

@sonaalKant if you think it can't done on sorrentum feel free to assign issues filed by Dan

sonaalKant commented 6 months ago

Some context : run_notebooks takes log dir of experiment run and from that log dir it fetches config files to extract relevant information to build and run notebooks. sometimes we make changes in config interface which messes up with the interface here and not having unit test makes it difficult for us to debug and resolve.

So Your task here is to verify the interface is correct by writing unit test. You don't need the config file to write unit test just go through the code look what all keys we expect from config and write your own config file. For eg:

# Extract bar duration in seconds from a loaded system config.
bar_duration_in_secs = system_config["dag_runner_config"][
    "bar_duration_in_secs"
]

Now from this code snippet you know

system_config = {"dag_runner_config" : { "bar_duration_in_secs" : 10 }
                             }

So, you can build the config like this and easily write the unit tests. We don't want to publish or run notebook so just mock them to return None. Hope that makes sense.

sonaalKant commented 6 months ago

@samarth9008 this is going to be difficult for @Jd8111997 unless he has access to cmamp to test with real data.

I dont think real data is necessary in this test. We just want to have unit test to verify the correctness of interface, system config and paths can easily be inferred from code.

Jd8111997 commented 6 months ago

Thanks for clarification @sonaalKant .

Jd8111997 commented 6 months ago

@sonaalKant @samarth9008, I am trying to create a new branch for this issue by following instructions mentioned in https://github.com/kaizen-ai/kaizenflow/blob/f57939221815b3c5ad183f885996181c8394ec93/docs/work_tools/all.git.how_to_guide.md.

First, I am trying to get the branch name for this issue by running this command : i gh_issue_title -i 768.

However, I am encountering a new error that I have never seen before:

(amp.client_venv) (base) jd81197@jd81197-Latitude-E7250:~/src/sorrentum1$ i gh_issue_title -i 768
One and only one set-up config should be true:
is_cmamp_prod=False
is_dev4=False
is_dev_ck=False
is_ig_prod=False
is_inside_ci=False
is_mac=False
INFO: > cmd='/home/jd81197/src/venv/amp.client_venv/bin/invoke gh_issue_title -i 768'
## gh_issue_title: issue_id='768', repo_short_name='current'
## gh_login: 
12:18:00 - INFO  lib_tasks_gh.py gh_login:51                            account='sorrentum'
12:18:00 - WARN  lib_tasks_gh.py gh_login:59                            Can't find file '/home/jd81197/.ssh/id_rsa.sorrentum.github'
12:18:00 - WARN  lib_tasks_gh.py gh_login:70                            Can't find file '/home/jd81197/.ssh/github_pat.sorrentum.txt'
Traceback (most recent call last):
  File "/home/jd81197/src/venv/amp.client_venv/bin/invoke", line 8, in <module>
    sys.exit(program.run())
  File "/home/jd81197/src/venv/amp.client_venv/lib/python3.10/site-packages/invoke/program.py", line 398, in run
    self.execute()
  File "/home/jd81197/src/venv/amp.client_venv/lib/python3.10/site-packages/invoke/program.py", line 583, in execute
    executor.execute(*self.tasks)
  File "/home/jd81197/src/venv/amp.client_venv/lib/python3.10/site-packages/invoke/executor.py", line 140, in execute
    result = call.task(*args, **call.kwargs)
  File "/home/jd81197/src/venv/amp.client_venv/lib/python3.10/site-packages/invoke/tasks.py", line 138, in __call__
    result = self.body(*args, **kwargs)
  File "/home/jd81197/src/sorrentum1/helpers/lib_tasks_gh.py", line 374, in gh_issue_title
    title, url = _get_gh_issue_title(issue_id, repo_short_name)
  File "/home/jd81197/src/sorrentum1/helpers/lib_tasks_gh.py", line 318, in _get_gh_issue_title
    repo_full_name_with_host, repo_short_name = _get_repo_full_name_from_cmd(
  File "/home/jd81197/src/sorrentum1/helpers/lib_tasks_gh.py", line 290, in _get_repo_full_name_from_cmd
    ret_repo_short_name = hgit.get_repo_name(
  File "/home/jd81197/src/sorrentum1/helpers/hgit.py", line 631, in get_repo_name
    hdbg.dassert_in(
  File "/home/jd81197/src/sorrentum1/helpers/hdbg.py", line 295, in dassert_in
    _dfatal(txt, msg, *args, only_warning=only_warning)
  File "/home/jd81197/src/sorrentum1/helpers/hdbg.py", line 142, in _dfatal
    dfatal(dfatal_txt)
  File "/home/jd81197/src/sorrentum1/helpers/hdbg.py", line 71, in dfatal
    raise assertion_type(ret)
AssertionError: 
################################################################################
* Failed assertion *
'sorrentum/sorrentum' in '{'alphamatic/amp': 'amp', 'kaizen-ai/dev_tools': 'dev_tools', 'kaizen-ai/kaizenflow': 'kaizen'}'
Invalid name='sorrentum/sorrentum' for in_mode='full_name'
################################################################################

Let me know if you guys know any solution to resolve this.

samarth9008 commented 6 months ago

Try cloning in the new folder with new name as repo name is changed.

Jd8111997 commented 6 months ago

Try cloning in the new folder with new name as repo name is changed.

Thanks, it worked.