openmainframeproject / software-discovery-tool

Software Discovery Tool
Apache License 2.0
31 stars 40 forks source link

Need to Re-Create all data files regularly! #147

Open Princee215 opened 1 year ago

Princee215 commented 1 year ago

As working on IBM-Validated-Software List I noticed that after some time some packages in a file get expired or modified. That's the case for every file that we've in our supported_distro.py. We might be serving the wrong packages to our users through our platform and that's something we can't ignore and might need to correct. Till now we don't update the files till it is Expired or if the file is not being created successfully.

I've attached a Screenshot for your reference. This is the file for IBM_Validated_OSS_List_Ubuntu_2004.json. As you can see, packages change over time.

Changes

Therefore, We might need to drop our data files and recreate them after some time regularly. We can also automate this so that we don't have to create each and every file again and again manually.

@pleia2 and @arshPratap Please take a look into this.

arshPratap commented 1 year ago

@Princee215 yup this sounds like a good addition... Especially for the backend feature that is currently ij development and also for the automation task

pleia2 commented 1 year ago

As a policy, I try to update the production data on sdt.openmainframeproject.org on a monthly basis. In practice, this is a manual process and I don't actually do it that often.

The good news is that there shouldn't be any changes required to bin/package_build.py since we're already overwriting all the content when the script runs, rather than trying to append, or do any complicated replacements.

Now, to move forward, there are two components to this.

  1. Automate the submitting a regular (weekly? monthly?) PR of files that changed for everything we manage in our -data repository via package_build.py: https://github.com/openmainframeproject/software-discovery-tool-data

Regardless of what we decide for the next step, we should definitely do this one!

  1. Automate updates to production.

We have two options here. Today, I get the raw data for our production instance from our -data repository, I don't run package_build.py on the production server itself. So we could:

2a. Add instructions to our -deploy repository to automate this by pulling from the -data repository, maybe suggest a cron job that calls a script that grabs everything from -data that's in our production supported_distro.py?

OR...

2b. We can also rethink this entirely and maybe we DO run bin/package_build.py in production. In this case we'd document similar to above, maybe with a cron job that runs periodically and emails an admin (me) with any failures.

Once we decide what to do, we can split this off into two issues in the -data and -deploy repositories.

Rohitrky2021 commented 3 months ago

is it still open ? , @pleia2 @rachejazz can u assign it to me?

pleia2 commented 3 months ago

is it still open ? , @pleia2 @rachejazz can u assign it to me?

@Rohitrky2021 This one is a bit complicated because it has multiple parts and there isn't a clear answer as to which way we want to go. So before a PR is created, we need to have a discussion in this issue about the best way forward. Did you have some thoughts?