Closed brockfanning closed 4 years ago
@brockfanning does anything need to be changed in data config file for this?
@LucyGwilliamAdmin Not that I know of.
Do I need rows here for each language? https://github.com/LucyGwilliamAdmin/open-sdg-data-starter-1/blob/develop/open_sdg_config_sdmx.yml#L33
I'm mainly asking as trying to see if possible for English xlsx meta input files to have English in first column and Russian xlsx meta input files to have Russian in first column (field name column) as long as all values are in the metadata-mapping file but I don't seem to be having much luck - do you know if this should be possible or not? If not, no probs, just trying to get an idea
I may be misunderstanding and/or misremembering how this works, but I believe that the non-default languages should be in subfolders. So for example if you are importing CSV metadata for a platform with Spanish as the default language, and English as a second language, you would put your Spanish metadata files in the folder specified with path_pattern
, and then you would put an en
subfolder in that folder. So if path_pattern
is meta-csv
then the Spanish files would go in meta-csv
and the English files would go in meta-csv/en
. Hopefully I'm not remembering that wrong.
Also here is the part of the new code related to the metadata mapping: https://github.com/open-sdg/sdg-build/pull/170/files#diff-34796376f53b09765921dffaa1834531R113-R136
@brockfanning thanks, think that's the setup I've got, then for the Russian (default language) meta Excel input I have a column which contains human-readable field names in Russian and a column which contains field values in Russian. In the English (2nd language) meta Excel input I have a column which contains human-readable field names in English and a column which contains field values in English. I then have metadata mapping csv file which in the first column contains all human-readable field names (so Russian and English) and then the second column contains the machine readable field names (twice) so like a 2 to 1 mapping I guess.
Should this work, or do all human readable names need to be in one language?
For example in meta I have 1-1-1.xlsx:
In meta/en I have 1-1-1.xlsx:
Then I have metadata-mapping.csv:
Yep I think that should work (at least that's the intention).
@brockfanning is the human_key supposed to be the index when the mapping is read in?
This is an attempt to allow, for example, a CSV meta input from one folder, and a YAML meta input from another folder.
This side-steps the meta.py and git.py include files, and in theory those could be deleted.
This needs plenty of testing. In particular:
- Does the git stuff still work? (metadata "last update date" fields getting populated automatically)
- Does the multilingual subfolder approach still work for metadata?
- Does the metadata mapping work?
@brockfanning I've just looked at latest changes and:
Re. multilingual subfolder approach - I'm not sure what's happening. Seems to be fine for Excel (all terms are translating). But when it comes to yaml input the fields aren't getting translated e.g. Graph title, Units of measurement, National geographical coverage
https://lucygwilliamadmin.github.io/open-sdg-site-starter-1/en/1-1-1/
Looking at the Data/Metadata last updated fields on the Indicator information tab, something seems to be up with thatm- metadata date isn't updating but data date isn't showing at all
@LucyGwilliamAdmin I added a commit to hopefully help with that translation issue.
About the last update dates, let's see if that commit also helps there. It probably won't though. One thorny problem: the last-updated-date for the data comes from looking at the last Git commit for the data file. But how can the code know whether that data file should be a .csv file or a .xml file? It may be tricky to support that last-update-date when the data not always in one type of file.
As a possible way to address the issue mentioned above, my latest commit looks for a metadata field called data_filename
which can have the filename of the data for that indicator. So for example if the data for 1.1.1 is in 1-1-1.csv
, the 1.1.1 metadata could include:
data_filename: 1-1-1.csv
And if the data for 1.2.1 is in 1-2-1.xml
, the 1.2.1 metadata could include:
data_filename: 1-2-1.xml
@brockfanning ok - just tried this:
https://lucygwilliamadmin.github.io/open-sdg-site-starter-1/en/1-1-1/
@LucyGwilliamAdmin My suspicion is that the Excel date is showing only because it's the last to run (see the order here).
I've added a check for meta_filename
as well, which might help with this. However the order may still have an effect. I think the key will be to specify the meta_filename
in the metadata file that you don't want to affect the last updated date.
For example, if you are using both 1-1-1.csv
and 1-1-1.xlsx
for metadata, and you don't want the 1-1-1.csv
to affect the last updated date, then you would need to add meta_filename: 1-1-1.xlsx
in the CSV file. My intention is that this will ensure that the CSV file doesn't use itself for the last updated date (though I haven't tested it).
@LucyGwilliamAdmin My suspicion is that the Excel date is showing only because it's the last to run (see the order here).
I've added a check for
meta_filename
as well, which might help with this. However the order may still have an effect. I think the key will be to specify themeta_filename
in the metadata file that you don't want to affect the last updated date.
Ah that's not so bad then - I think in this case at least, I would always want the Excel metadata date to show.
For example, if you are using both
1-1-1.csv
and1-1-1.xlsx
for metadata, and you don't want the1-1-1.csv
to affect the last updated date, then you would need to addmeta_filename: 1-1-1.xlsx
in the CSV file. My intention is that this will ensure that the CSV file doesn't use itself for the last updated date (though I haven't tested it).
I have just tested this new field though and I'm getting an error: https://github.com/LucyGwilliamAdmin/open-sdg-data-starter-1/runs/1154682637?check_suite_focus=true
@LucyGwilliamAdmin Ah, I see that my latest code can't handle if the other file is in a different folder.
We could try adding yet another field, meta_filefolder
. It's starting to get a bit complex though - I wonder how you would feel about removing that meta_filename
code and just relying on the input ordering for that last-update-date issue.
I think removing the field and rely on ordering - for now I foresee the need for files being in separate folders is to have the 'settings' in one and actual metadata in another
Sounds good - I've just reverted that last change.
@brockfanning ok - I'm happy with everything I've tested so far - do you think there's anything else that needs to be tested?
I think that covers it. One thing though: meta.py and git.py can - in theory - be deleted now. Do you think we should go ahead and delete them in this PR? Or follow up in a separate PR for that?
I think they could be deleted in this PR
@LucyGwilliamAdmin Ok, I've deleted those. For good measure do you think you could make sure that your data build still works?
Update: I tried it locally and the build completed without errors.
This is an attempt to allow, for example, a CSV meta input from one folder, and a YAML meta input from another folder.
This side-steps the meta.py and git.py include files, and in theory those could be deleted.
This needs plenty of testing. In particular: