Closed ascott closed 4 years ago
for reference here is the csv i originally created: https://docs.google.com/spreadsheets/d/1VtyjbjfBYpDiI9v9SSYAEjCWejYx-wzTlL_8JgnwNk4/edit#gid=893083033 and scott's feedback notes: https://docs.google.com/document/d/1JzlhPvDiBU9l3kn5tqh5bJxDBaJf8aQXzCGDhUL85e0/edit#
@scottofletcher @jesicarson @plscully here is the latest csv file with incorporated changes from scott's notes and our meeting: https://docs.google.com/spreadsheets/d/1MZDvZRWCWJDkZqEKW0NFgcG4CNNNZq4-M0ubWgTScDQ/edit?usp=sharing (this is just placeholder data for now from my local database while we get the format nailed down)
changes:
is_component_of
and primary_organizer
fields to 3 new fields with id, title and urlhas_components
and specific_methods_tools_techniques
i added new fields with the count for each of these fields. since they can contain many items with id, type, title, it makes it difficult to share that in the csv file since it's a list of nested data. i thought counts for these fields might be useful.@ascott This looks great! Thank you! I downloaded the file as Excel. Here's a screenshot of the msg I saw when I tried to open it. https://www.dropbox.com/s/6ml1trq5g8ugfem/Screenshot%202019-07-18%2009.32.20.png?dl=0 .... Once the download was complete, I saw this msg https://www.dropbox.com/s/0cpei0f7lszkuf8/Screenshot%202019-07-18%2009.28.49.png?dl=0 . At first glance, the converted Excel file looks OK, but you and @scottofletcher will be able to judge that.
Here's the Excel file
participedia-data-cases-(local placeholder data)-july17.xlsx
couple things:
@plscully thanks for the excel screenshots. could you send me the log file it links to in the last screenshot you sent?
@scottofletcher for has_components
and specific_methods_tools_techniques
, these fields can have multiple items so they are not like is_component_of
where it represents a single article where we can convert to 3 fields with id, url and title. for each item in has_components
and specific_methods_tools_techniques
we would need to create the three fields and label them with numbers as well. could we limit these to the first 3 items for each? that would look like adding the following columns: has_components_1_id, has_components_1_title, has_components_1_url, has_components_2_id, has_components_2_title, has_components_2_url, has_components_3_id, has_components_3_title, has_components_3_url (and the same for specific_methods_tools_techniques.) what do you think?
yes, that's what I thought we'd have to engineer, but we can't limit it to 3 b/c the whole point of those fields is to encourage users to link to as many (relevant) cases and methods/tools as possible (thereby creating more robust, interlinking datasets). I'm not sure what to do about that - perhaps Matt and Kate will be able to weigh in. the old .net csv simply listed their titles separated by commas (like we currently do for other fields like gen issues, spec topics, etc.). is it possible to do that for now?
@scottofletcher yes we could list their titles or the urls separated by commas. would title or url be more useful?
@ascott I'm not sure if this is what you are looking for, but I downloaded the file again and then opened the link when the same msg appeared. I then pasted what appeared in my browser into this Word doc
case_Excel data download log 18 July 2019.docx
@ascott probably just title at this stage. the nice thing about methods/tools/techniques is that their titles are pretty short and to-the-point (unlike some of our cases....). components are obviously a bit more lengthy, but I think title is still better than url (at least you can get a sense of what kinds of components the case has). I'll flag these fields as something to bring up with Matt & Kate unless you, @dethe and/or @jesicarson have any ideas?
@plscully thanks, i think that error has something to do with a cell having too much text in it. i will look into it.
@scottofletcher i'll make that change, and then we can adjust as needed after matt and kate have had a chance to review.
it might be that we have implement what Dethe first suggested: a system for 'hard-core quants' to request 'hard-core' dataset with all the bells and whistles. I think this works fine for the average folk :)
Go team 👏👏
This is great!! Thank you!!
On Thu, Jul 18, 2019 at 9:11 PM jesicarson notifications@github.com wrote:
Go team 👏👏
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/participedia/api/issues/681?email_source=notifications&email_token=AHBFHUI47OQRAZFUJCF4GWTQAEIERA5CNFSM4IEV5AF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KIQOI#issuecomment-513050681, or mute the thread https://github.com/notifications/unsubscribe-auth/AHBFHUPGOK2JGR4ZQ7ZKP53QAEIERANCNFSM4IEV5AFQ .
-- Patrick L. Scully, Ph.D. President, Clearview Consulting, LLC T 860.561.1866 www.clearviewconsultingllc.com http://www.clearviewconsultingllc.com
@scottofletcher these are ready to test with production data now. you can download the csv's with these urls:
https://participedia.net/?selectedCategory=case&returns=csv https://participedia.net/?selectedCategory=method&returns=csv https://participedia.net/?selectedCategory=organizations&returns=csv
Awesome!!! I'll try it out first thing tomorrow morning!
Changes needed:
CASES
[ ] columns after original language (column O) should be in the following order:
general issues
specific topics
location (8 columns: address1, address2,city, province, country, lat, long)
scope of influence
is component of (3 columns: id, title, url)
has components title
start date
end date
ongoing
time limited
purpose
approach
public spectrum
number of participants
open or limited
recruitment method
targeted participants
method types
tool/technique types
specific methods/tool/techniques titles
legality
facilitators
facilitator training
face-to-face/online
participant interaction
learning resources
decision methods
if voting
primary organizer (3 columns: id, title, url)
organizer type
funder
funder types
staff
volunteers
evidence of impact
types of change
implementers of change
formal evaluation
body
photos count
files count
videos count
audio count
evaluation report count
evaluation links count
[ ] time_limited values are currently 'repeated' or 'a'. please change 'a' to 'limited'
METHODS
ORGANIZATIONS
QUESTION: the field 'General Types of Methods' is reported differently in the spreadsheets (method_types for cases and methods, type_method for orgs). can we confirm that this is the same field across cases (ie. when we have all our sidebar data hyperlinked, if I click on a general type of method in, say, a case, it will return all cases, methods, and orgs that have that type entered out in the field relevant to that entry type?
@scottofletcher yes, method_types
and type_method
are the same across all article types but just have different names
ah, ok good. is it possible to use the same name across datasets?
@ascott is it possible to have the dates in a more succinct format? we just need dd/mm/yyyy. otherwise it makes it difficult for editors to track when edits were made
@scottofletcher the reason for using the current date format of 2019-07-24T14:15:11.206Z
is that it's an international standard for displaying date & time. A couple issues with dd/mm/yyyy
is that it doesn't include a timestamp and it's not understood the same way internationally which could cause confusion:
https://en.wikipedia.org/wiki/Date_format_by_country
Writers have traditionally written abbreviated dates according to their local custom, creating all-numeric equivalents to dates such as '26 July 2019' (26/07/19) and 'July 26, 2019' (07/26/19). This can result in dates that are impossible to understand correctly without knowing the writer's origin and/or other contextual details, as dates such as "10/11/06" can be interpreted as "10 November 2006" in the DMY format, "October 11, 2006" in MDY, and "2010 November 6" in YMD.
i would recommend sticking with the current international format. it can be sorted chronologically, so i'm unclear on how it makes it difficult for editors to track when changes have been made. what are issue are you seeing?
ah, ok good. is it possible to use the same name across datasets?
@scottofletcher i will update the csv's so this field uses the same key across csv's
@ascott RE date format: when we edit an entry, we manually plug in the day we edited it (see column G in this spreadsheet: https://docs.google.com/spreadsheets/d/1uiSNHVzTWByC9ZgawKLcE7WTVyiKELQFxn_lthQDgFI/edit?usp=sharing). that messes up the ability to sort by edit date. the only alternative I can think of is downloading the CSV to get the edit date to plug in, but that would be crazy labour intensive...
Is it possible to get a 'count' for all data fields? the media field counts are really helpful, but it would be great if I could see which entries need the most work (ie. have the fewest fields completed)
There is an older issue to track this, but it was based on the old model, so creating this new issue to track our progress/decisions about this.