participedia / api

Website and API for Participedia V3
https://participedia.net
MIT License
19 stars 13 forks source link

Add CSV download url for all cases, methods and orgs #681

Closed ascott closed 4 years ago

ascott commented 5 years ago

There is an older issue to track this, but it was based on the old model, so creating this new issue to track our progress/decisions about this.

ascott commented 5 years ago

for reference here is the csv i originally created: https://docs.google.com/spreadsheets/d/1VtyjbjfBYpDiI9v9SSYAEjCWejYx-wzTlL_8JgnwNk4/edit#gid=893083033 and scott's feedback notes: https://docs.google.com/document/d/1JzlhPvDiBU9l3kn5tqh5bJxDBaJf8aQXzCGDhUL85e0/edit#

ascott commented 5 years ago

@scottofletcher @jesicarson @plscully here is the latest csv file with incorporated changes from scott's notes and our meeting: https://docs.google.com/spreadsheets/d/1MZDvZRWCWJDkZqEKW0NFgcG4CNNNZq4-M0ubWgTScDQ/edit?usp=sharing (this is just placeholder data for now from my local database while we get the format nailed down)

changes:

plscully commented 5 years ago

@ascott This looks great! Thank you! I downloaded the file as Excel. Here's a screenshot of the msg I saw when I tried to open it. https://www.dropbox.com/s/6ml1trq5g8ugfem/Screenshot%202019-07-18%2009.32.20.png?dl=0 .... Once the download was complete, I saw this msg https://www.dropbox.com/s/0cpei0f7lszkuf8/Screenshot%202019-07-18%2009.28.49.png?dl=0 . At first glance, the converted Excel file looks OK, but you and @scottofletcher will be able to judge that.

plscully commented 5 years ago

Here's the Excel file
participedia-data-cases-(local placeholder data)-july17.xlsx

scottofletcher commented 5 years ago

couple things:

ascott commented 5 years ago

@plscully thanks for the excel screenshots. could you send me the log file it links to in the last screenshot you sent?

Screenshot 2019-07-18 11 35 19

@scottofletcher for has_components and specific_methods_tools_techniques, these fields can have multiple items so they are not like is_component_of where it represents a single article where we can convert to 3 fields with id, url and title. for each item in has_components and specific_methods_tools_techniques we would need to create the three fields and label them with numbers as well. could we limit these to the first 3 items for each? that would look like adding the following columns: has_components_1_id, has_components_1_title, has_components_1_url, has_components_2_id, has_components_2_title, has_components_2_url, has_components_3_id, has_components_3_title, has_components_3_url (and the same for specific_methods_tools_techniques.) what do you think?

scottofletcher commented 5 years ago

yes, that's what I thought we'd have to engineer, but we can't limit it to 3 b/c the whole point of those fields is to encourage users to link to as many (relevant) cases and methods/tools as possible (thereby creating more robust, interlinking datasets). I'm not sure what to do about that - perhaps Matt and Kate will be able to weigh in. the old .net csv simply listed their titles separated by commas (like we currently do for other fields like gen issues, spec topics, etc.). is it possible to do that for now?

ascott commented 5 years ago

@scottofletcher yes we could list their titles or the urls separated by commas. would title or url be more useful?

plscully commented 5 years ago

@ascott I'm not sure if this is what you are looking for, but I downloaded the file again and then opened the link when the same msg appeared. I then pasted what appeared in my browser into this Word doc
case_Excel data download log 18 July 2019.docx

scottofletcher commented 5 years ago

@ascott probably just title at this stage. the nice thing about methods/tools/techniques is that their titles are pretty short and to-the-point (unlike some of our cases....). components are obviously a bit more lengthy, but I think title is still better than url (at least you can get a sense of what kinds of components the case has). I'll flag these fields as something to bring up with Matt & Kate unless you, @dethe and/or @jesicarson have any ideas?

ascott commented 5 years ago

@plscully thanks, i think that error has something to do with a cell having too much text in it. i will look into it.

@scottofletcher i'll make that change, and then we can adjust as needed after matt and kate have had a chance to review.

scottofletcher commented 5 years ago

it might be that we have implement what Dethe first suggested: a system for 'hard-core quants' to request 'hard-core' dataset with all the bells and whistles. I think this works fine for the average folk :)

jesicarson commented 5 years ago

Go team 👏👏

plscully commented 5 years ago

This is great!! Thank you!!

On Thu, Jul 18, 2019 at 9:11 PM jesicarson notifications@github.com wrote:

Go team 👏👏

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/participedia/api/issues/681?email_source=notifications&email_token=AHBFHUI47OQRAZFUJCF4GWTQAEIERA5CNFSM4IEV5AF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KIQOI#issuecomment-513050681, or mute the thread https://github.com/notifications/unsubscribe-auth/AHBFHUPGOK2JGR4ZQ7ZKP53QAEIERANCNFSM4IEV5AFQ .

-- Patrick L. Scully, Ph.D. President, Clearview Consulting, LLC T 860.561.1866 www.clearviewconsultingllc.com http://www.clearviewconsultingllc.com

ascott commented 5 years ago

@scottofletcher these are ready to test with production data now. you can download the csv's with these urls:

https://participedia.net/?selectedCategory=case&returns=csv https://participedia.net/?selectedCategory=method&returns=csv https://participedia.net/?selectedCategory=organizations&returns=csv

scottofletcher commented 5 years ago

Awesome!!! I'll try it out first thing tomorrow morning!

scottofletcher commented 5 years ago

Changes needed:

CASES

METHODS

ORGANIZATIONS

QUESTION: the field 'General Types of Methods' is reported differently in the spreadsheets (method_types for cases and methods, type_method for orgs). can we confirm that this is the same field across cases (ie. when we have all our sidebar data hyperlinked, if I click on a general type of method in, say, a case, it will return all cases, methods, and orgs that have that type entered out in the field relevant to that entry type?

ascott commented 5 years ago

@scottofletcher yes, method_types and type_method are the same across all article types but just have different names

scottofletcher commented 5 years ago

ah, ok good. is it possible to use the same name across datasets?

scottofletcher commented 5 years ago

@ascott is it possible to have the dates in a more succinct format? we just need dd/mm/yyyy. otherwise it makes it difficult for editors to track when edits were made

ascott commented 5 years ago

@scottofletcher the reason for using the current date format of 2019-07-24T14:15:11.206Z is that it's an international standard for displaying date & time. A couple issues with dd/mm/yyyy is that it doesn't include a timestamp and it's not understood the same way internationally which could cause confusion:

https://en.wikipedia.org/wiki/Date_format_by_country

Writers have traditionally written abbreviated dates according to their local custom, creating all-numeric equivalents to dates such as '26 July 2019' (26/07/19) and 'July 26, 2019' (07/26/19). This can result in dates that are impossible to understand correctly without knowing the writer's origin and/or other contextual details, as dates such as "10/11/06" can be interpreted as "10 November 2006" in the DMY format, "October 11, 2006" in MDY, and "2010 November 6" in YMD.

i would recommend sticking with the current international format. it can be sorted chronologically, so i'm unclear on how it makes it difficult for editors to track when changes have been made. what are issue are you seeing?

ascott commented 5 years ago

ah, ok good. is it possible to use the same name across datasets?

@scottofletcher i will update the csv's so this field uses the same key across csv's

scottofletcher commented 5 years ago

@ascott RE date format: when we edit an entry, we manually plug in the day we edited it (see column G in this spreadsheet: https://docs.google.com/spreadsheets/d/1uiSNHVzTWByC9ZgawKLcE7WTVyiKELQFxn_lthQDgFI/edit?usp=sharing). that messes up the ability to sort by edit date. the only alternative I can think of is downloading the CSV to get the edit date to plug in, but that would be crazy labour intensive...

scottofletcher commented 5 years ago

Is it possible to get a 'count' for all data fields? the media field counts are really helpful, but it would be great if I could see which entries need the most work (ie. have the fewest fields completed)