kbaseapps / GenomeFileUtil

MIT License
0 stars 16 forks source link

RE2022-272: Add a bulk version of genbank_to_genome in GFU #208

Closed Xiangs18 closed 1 month ago

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 99.42197% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 80.61%. Comparing base (d48a690) to head (5097ef0).

Files Patch % Lines
lib/GenomeFileUtil/GenomeFileUtilImpl.py 90.47% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #208 +/- ## ========================================== + Coverage 79.25% 80.61% +1.35% ========================================== Files 11 11 Lines 2902 3007 +105 ========================================== + Hits 2300 2424 +124 + Misses 602 583 -19 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Xiangs18 commented 5 months ago

Moving forward, there should be at least 3 PRs

  1. parse genbank files before any upload
  2. mass save_genomes
  3. parallelize genbanks_to_genomes, including batching logic
MrCreosote commented 5 months ago

Moving forward, there should be at least 3 PRs

I think there's another one - batch ws calls into < 10k each, unless that's part of the parallelization PR

Xiangs18 commented 5 months ago

It is part of parallelization PR. I updated comments above.

MrCreosote commented 5 months ago

Note to self - all comments are addressed above this point

Xiangs18 commented 4 months ago

Hi @MrCreosote, the rest of 3 missing annotations are heavily domain knowledge related as well.

I cannot find genbank files that can cover all 3.

However, through trial and error, 2 (out of 3) can be covered if we use Cyanidioschyzon_merolae_one_locus.gbff file and specify generate_missing_genes.

No quite sure what is the best way to test this in addition to _check_result_object_info_fields_and_provenance func

MrCreosote commented 4 months ago

What case can't you cover?

No quite sure what is the best way to test this in addition to _check_result_object_info_fields_and_provenance func

Not sure what you're asking here, those are orthogonal tests

Xiangs18 commented 3 months ago

@MrCreosote Future PRs:

MrCreosote commented 3 months ago

Most of those sound ok but

check the data for genome check data/prov/info for the new assembly

tests the core functionality added here, and so IMO need tests here

Xiangs18 commented 2 months ago

@MrCreosote Have we discussed about export_genome_features_protein_to_fasta? It doesn't ring a bell. This function is no long in use?

MrCreosote commented 2 months ago

https://app.slack.com/client/T026VDM4X/C4E7KUGTD

Xiangs18 commented 2 months ago

https://github.com/kbaseapps/GenomeFileUtil/pull/208#issuecomment-2113617937

@MrCreosote This link only directs me to kbase_coders channel.

MrCreosote commented 2 months ago

ok, try https://kbase.slack.com/archives/C4E7KUGTD/p1710806243115819