marco-c / autowebcompat

Automatically detect web compatibility issues
Mozilla Public License 2.0
35 stars 41 forks source link

Migrate files according to new convention #252

Closed sagarvijaygupta closed 6 years ago

sagarvijaygupta commented 6 years ago

Migrate data image files, text files, labelled files in label_persons according to the new conventions incorporated in #88

Fixes #251

codecov-io commented 6 years ago

Codecov Report

Merging #252 into master will decrease coverage by 0.64%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
- Coverage   16.48%   15.83%   -0.65%     
==========================================
  Files          12       13       +1     
  Lines        1820     1894      +74     
  Branches      311      328      +17     
==========================================
  Hits          300      300              
- Misses       1518     1592      +74     
  Partials        2        2
Impacted Files Coverage Δ
migrate_files.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f5efc2d...4c30e16. Read the comment docs.

marco-c commented 6 years ago

Let's put the code in a function migrate_v1_to_v2, as we might have other migrations in the future. We should put a VERSION file in the data/ repository, so we know which migration should be applied.

sagarvijaygupta commented 6 years ago

What should we write in VERSION file?

marco-c commented 6 years ago

What should we write in VERSION file?

Right now, just 1. Then the migrate script should check the VERSION file, if it's 1 it should call migrate_v1_to_v2 (which changes it to 2).

marco-c commented 6 years ago

(The VERSION file should be added to the autowebcompat-data repo on GitLab)

sagarvijaygupta commented 6 years ago

Then the migrate script should check the VERSION file, if it's 1 it should call migrate_v1_to_v2 (which changes it to 2)

You mean that after checking VERSION I should call migrate_v1_to_v2 by default? We expect future migrations of the form migrate_v2_to_v3, migrate_3_to_v4 or migrate_v1_to_v3, migrate_v1_to_v4?

sagarvijaygupta commented 6 years ago

@marco-c should I make the to_version as argparse input?

marco-c commented 6 years ago

You mean that after checking VERSION I should call migrate_v1_to_v2 by default? We expect future migrations of the form migrate_v2_to_v3, migrate_3_to_v4 or migrate_v1_to_v3, migrate_v1_to_v4?

migrate_v1_to_v2 shouldn't always be called, but only if VERSION is == 1. Future migrations will be migrate_v2_to_v3, migrate_v3_to_v4 and so on, this way we can support all possible migrations.

marco-c commented 6 years ago

@marco-c should I make the to_version as argparse input?

The to_version should always be the latest (in this case 2).

sagarvijaygupta commented 6 years ago

Okay. So should I recursively call all of them and update the VERSION file in the end or we will just call it once and if wanted change VERSION file by ourselves and again call?

marco-c commented 6 years ago

Okay. So should I recursively call all of them and update the VERSION file in the end or we will just call it once and if wanted change VERSION file by ourselves and again call?

Not recursively, but one after another. Each function should also update the VERSION file.

In pseudocode, something like this:

for i in range(CURRENT_VERSION, LATEST_VERSION - 1):
    migrate_v(i)_to_v(i+1)
propr[bot] commented 6 years ago

Please provide your feedback on this pull request here.

Privacy statement: We don't store any personal information such as your email address or name. We ask for GitHub authentication as an anonymous identifier to account for duplicate feedback entries and to see people specific preferences.

marco-c commented 6 years ago

I'm adding the VERSION file to the data repository.

sagarvijaygupta commented 6 years ago

@marco-c Can you see the VERSION file now? I added it previously also. I realized I had to commit in the submodule also before adding.

marco-c commented 6 years ago

You can't update the submodule without actually updating it on GitLab, otherwise it's not available to anyone else except you (infact, Travis fails to clone the submodule now, as the hash you provide is local to your machine).

Don't worry about adding the VERSION file, as I'm doing it now.

For the future, when you want to change something in data or tools, you actually have to send a pull request on GitLab, so that it can be merged and can be available to everybody.

sagarvijaygupta commented 6 years ago

Okay! I will try it next time. I actually wanted to update some new full_page_screenshots previously but was not able to at that time!

sagarvijaygupta commented 6 years ago

Should we keep a file along with VERSION file which corresponds to the screenshots present in that version. Since we won't be able to push images if generated from present crawler as after that migrate function will try to migrate them also. (or simply the time stamp?)

marco-c commented 6 years ago

Should we keep a file along with VERSION file which corresponds to the screenshots present in that version. Since we won't be able to push images if generated from present crawler as after that migrate function will try to migrate them also. (or simply the time stamp?)

I guess it's better if we just do the migration before adding new images.

propr[bot] commented 6 years ago

Please provide your feedback on this pull request here.

Privacy statement: We don't store any personal information such as your email address or name. We ask for GitHub authentication as an anonymous identifier to account for duplicate feedback entries and to see people specific preferences.