okfn / ibp-explorer

[ARCHIVED] Data Explorer for the Open Budget Survey, built in collaboration with the International Budget Partnership.
http://survey.internationalbudget.org
7 stars 5 forks source link

Google Drive uploading workflow #94

Closed dumyan closed 7 years ago

dumyan commented 7 years ago

Since we are manually tailoring the documents from the API with the uploads on Google Drive, we need to know the workflow of the uploading in order to properly match the uploads.

As a side question, are we allowed to rename files in order to have more structured file tree?

pwalsh commented 7 years ago

@dumyan about the renaming -yes. currently, they are renaming for country exact matching. what did you have in mind? let me know here, perhaps we can get it all done.

pwalsh commented 7 years ago

@dumyan now for the first question:

"Since we are manually tailoring the documents from the API with the uploads on Google Drive, we need to know the workflow of the uploading in order to properly match the uploads."

The workflow is that the IBP team add documents. They will follow a naming convention. Does this solve the issue?

Presuming yes, I suggest you present a naming convention that is best for us, so we can ensure it gets done and followed. Currently, they are focussed on mapping country names accurately, but I guess we have some other requirements - years, names of documents, etc. Please advise.

pwalsh commented 7 years ago

@dumyan also write in #91

" Linking GDrive filenames with the API data is not bulletproof. There were some discrepancies such as 'In-Year Report' vs 'In-Year Reports' (plural vs singular) when matching, extra whitespace in some filenames and so on. This is already handled in the code, but now I have found that some countries names are written differently in GDrive then in the tracker and there are possibly other discrepancies that should be found and corrected. "

@dumyan also cover this with my request above.

dumyan commented 7 years ago

I would first need to make a full list of the errors in the current naming and work from that. What I have I observed for now:

I will need to rework parts of the current mapping as now we are matching only by filename. Checking for country and year can possibly resolve the same document names issue.

Will get back here when I have the full info what needs to be done and will propose a naming convention based on that.

dumyan commented 7 years ago

This issue is discussed in #76, closing.