kbaseapps / GenomeFileUtil

MIT License
0 stars 16 forks source link

RE2022-272: Update GenomeFileUtil to run in batch & parallelization 1 #202

Closed Xiangs18 closed 6 months ago

Xiangs18 commented 7 months ago

first PR on GenomeFileUtil parallelization:

  1. Get catalog params in Impl file
  2. Made some comments below
codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (a79a233) 79.14% compared to head (f24b26a) 79.25%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #202 +/- ## ========================================== + Coverage 79.14% 79.25% +0.11% ========================================== Files 11 11 Lines 2886 2902 +16 ========================================== + Hits 2284 2300 +16 Misses 602 602 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Xiangs18 commented 7 months ago

Some comments:

  1. It takes a while to run all tests in GHA. Maybe for dev we only need to run test.problematic_tests.save_genome_test.
  2. Unlike AssemblyUtil, there is no mass function for save_one_genome function. And I am still thinking should we make a mass function first and then parallelize it or parallelizing on save_one_genome directly
  3. I will ignore the failed tests for now since it's not my code that fails the tests
  4. Similar to AssemblyUtil, I made an issue for this repo regarding to GHA https://github.com/kbaseapps/GenomeFileUtil/issues/203
  5. I will squash all git commits into one for this PR
Xiangs18 commented 7 months ago

@Tianhao-Gu @MrCreosote Any thoughts on 1 and 2 from my comments?

note to myself: 1.

  1. create a bulk method and then parallelize on top it
MrCreosote commented 7 months ago

It takes a while to run all tests in GHA. Maybe for dev we only need to run test.problematic_tests.save_genome_test.

That's ok locally, but for PRs we should run the entire suite.

Unlike AssemblyUtil, there is no mass function for save_one_genome function. And I am still thinking should we make a mass function first and then parallelize it or parallelizing on save_one_genome directly

I'd make the mass function and then parallelize it, like we discussed earlier in slack

I will ignore the failed tests for now since it's not my code that fails the tests

If you start working on a codebase with failing tests, the first thing that needs to be done is fixing the tests. Sometimes there may be a reason not to do so but that reason needs to be explicitly stated and very good. As it stands you could break functionality and not know it as the test that's testing said functionality isn't working. It's annoying but we all have had to fix tests we didn't break.

Similar to AssemblyUtil, I made an issue for this repo regarding to GHA https://github.com/kbaseapps/GenomeFileUtil/issues/203

Ugh. We might want to spend cycles figuring this out and fixing it if it affects multiple apps that a lot of other apps depend on

Xiangs18 commented 7 months ago

Now two type checking functions is in one.

Xiangs18 commented 6 months ago

added test_invalid_catalog_param_type in save_one_genome.py (one of problematic_tests). save_genome_test.py failed in GHA because of missing test_token2. Need to fix in Assembly&GenomFileUtil GHA PR. Ran pass this test locally.

Screen Shot 2023-12-21 at 9 06 59 AM
Xiangs18 commented 6 months ago

I ran pass the new unit test here: https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7414796618/job/20176629625#step:6:996

MrCreosote commented 6 months ago

It looks like some existing tests are failing due to the changes in this PR, for example: https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7414864018/job/20176831946?pr=202#step:6:5752

Xiangs18 commented 6 months ago

ran pass impl_test.py: https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7427037544/job/20211964841#step:6:996

Xiangs18 commented 6 months ago

https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7454444016/job/20281771440#step:6:995

MrCreosote commented 6 months ago

https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7454444016/job/20281771440#step:6:995

I'm not sure what I'm supposed to understand from this?

Xiangs18 commented 6 months ago

https://github.com/kbaseapps/GenomeFileUtil/actions/runs/7454444016/job/20281771440#step:6:995

I'm not sure what I'm supposed to understand from this?

It shows all 4 unit tests in impl_test.py are passing.

MrCreosote commented 6 months ago

It shows all 4 unit tests in impl_test.py are passing.

They weren't passing before?

Xiangs18 commented 6 months ago

It shows all 4 unit tests in impl_test.py are passing.

They weren't passing before?

There were 3 tests in impl_test.py. All of them passed. I updated impl_test.py based on your latest PR comments and would like to show all tests are passing after update.

MrCreosote commented 6 months ago

Note to self: All issues above this comment have been addressed

MrCreosote commented 6 months ago

@Tianhao-Gu you might want to take a look

Xiangs18 commented 6 months ago

I expect 551481e to fail. Why it is passing though ... Probably because I pushed f24b26a right after?