internetee / registry

TLD Management Software
Other
45 stars 19 forks source link

implement validator for company existion #2601

Open OlegPhenomenon opened 12 months ago

OlegPhenomenon commented 12 months ago

bundle exec rake company_status:check_all -- --open_data_file_path=lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv --missing_companies_output_path=lib/tasks/data/missing_companies_in_business_registry.csv --deleted_companies_output_path=lib/tasks/data/deleted_companies_from_business_registry.csv --download_path=https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip

This rake task performs the following actions:

Therefore, the attributes look like this:

Since this command already includes default values, it is not necessary to enter any parameters; they were simply added for greater flexibility. Therefore, you can run the following command: bundle exec rake company_status:check_all

and the data will be available in the directory lib/tasks/data

The job: CompanyRegisterStatusJob.perform_later(days_interval = 14, spam_time_delay = 0.2, batch_size = 100, download_open_data_file_url='https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip')

This job accepts the following parameters:

  1. days_interval - selects domains that were last checked more than {days_interval} days ago.
  2. spam_time_delay - this is the time delay when querying the business registry.
  3. batch_size - the size of the batch for processing. This is needed for optimization.
  4. download_open_data_file_url - the URL from which to download the business registry data.

As indicated above, all these values have default settings, so they can be modified if necessary.

What the job does:

POTENTIAL PROBLEM: It could happen that we decide to check a large array of data in one day, and say the next time we decide to check in a year, and logically this job might process a large list of companies exactly one year later. This should be kept in mind.

this PR related to this one #https://github.com/internetee/company_register/pull/6

related tickets: https://github.com/internetee/company_register/issues/4 https://github.com/internetee/company_register/issues/5

viezly[bot] commented 12 months ago

This pull request is split into 5 parts for easier review. 👀 Review pull request on Viezly

Changed files are located in these folders: