microbiomedata / nmdc-aggregator

Scripts that periodically aggregate data related to KEGG search
0 stars 0 forks source link

`berkeley`: update generate_metap_agg.py #10

Closed aclum closed 1 month ago

aclum commented 4 months ago

cc @SamuelPurvine @eecavanna

mslarae13 commented 2 months ago

@picowatt when do you think you could start this? Estimated start and end date?

mslarae13 commented 2 months ago

Do the actual results change with berkeley roll out?

Will the generate_metap_agg script break with berkeley?

Or is the ask here related to the de-bloat and post berkeley roll out??

@aclum @mbthornton-lbl @SamuelPurvine @picowatt

picowatt commented 2 months ago

@mslarae13 I can return to this on Thursday. I have changes to the scripts that are not yet committed, but should be testable then.

The results shouldn't change.

aclum commented 2 months ago

This is not related to deboat. We need to make changes to the script to correctly populate the aggregation table b/c workflow results now all share a collection so the name of the collection is different and now you have to filter on type to get just the records of interest.

eecavanna commented 2 months ago

@picowatt got in touch with me about deploying this to the Berkeley environment on Spin for testing. So far, the Berkeley environment has not had an aggregator running in it. So, deploying this to the Berkeley environment will include getting (any version of) an aggregator running in the Berkeley environment—and then getting this particular version running (so it can be tested).

I am currently working on getting any version running.

eecavanna commented 2 months ago

I deployed this branch to the Berkeley environment on Spin. Details are in this Slack message.

ssarrafan commented 1 month ago

Looks like PR hasn't been merged Moving this to one more sprint @aclum @picowatt

eecavanna commented 1 month ago

This may be the issue (or one of the issues) where the absence of certain data in the Berkeley database is blocking @picowatt from being able to test something. I'll be updating the Berkeley database today with a Berkeley-migrated version of the latest production dump. I'll message @picowatt on Slack when done (it may be after hours).

aclum commented 1 month ago

This can’t be merged until the Berkeley rollout mid-Oct

On Fri, Sep 20, 2024 at 4:24 PM eecavanna @.***> wrote:

This may be the issue (or one of the issues) where the absence of certain data in the Berkeley database is blocking @picowatt https://github.com/picowatt from being able to test something. I'll be updating the Berkeley database today with a Berkeley-migrated version of the latest production dump. I'll message @picowatt https://github.com/picowatt on Slack when done (it may be after hours).

— Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-aggregator/issues/10#issuecomment-2364564918, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6RD33BHD5D3P4HAOWYGZLZXSAANAVCNFSM6AAAAABLBTCFHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRUGU3DIOJRHA . You are receiving this because you were mentioned.Message ID: @.***>

ssarrafan commented 1 month ago

This can’t be merged until the Berkeley rollout mid-Oct On Fri, Sep 20, 2024 at 4:24 PM eecavanna @.> wrote: This may be the issue (or one of the issues) where the absence of certain data in the Berkeley database is blocking @picowatt https://github.com/picowatt from being able to test something. I'll be updating the Berkeley database today with a Berkeley-migrated version of the latest production dump. I'll message @picowatt https://github.com/picowatt on Slack when done (it may be after hours). — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6RD33BHD5D3P4HAOWYGZLZXSAANAVCNFSM6AAAAABLBTCFHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRUGU3DIOJRHA . You are receiving this because you were mentioned.Message ID: @.>

Ok I'm moving this to the next sprint