hhaccessibility / hhaccessibility.github.io

http://hhaccessibility.github.io/main/
7 stars 41 forks source link

Investigate Jaccede.com #26

Open joshi1983 opened 8 years ago

joshi1983 commented 8 years ago

Jaccede.com is a french app that does something similar to what we want to do. They have Andoid and Ios version as well as the web site. They also now support English.

I would appreciate if you can test it and provide some feedback. The best would be to see around you if you have disabled people to quickly check it out and provide some feedback that would be even greater!

Create a program to download and generate a CSV file from data available on jaccede.com. For importing from other websites, this involved using techniques like "web scraping", studying the website's API documentation, or reverse engineering to find API's and understand them through educated guesswork.

The most difficult part of this issue would be the reverse engineering so I wanted to start that process and share some of my findings with you. I found a few things you could use to get started on an importer. jaccede.com has some location search API's which return JSON. Here is a screenshot of Chrome with the web developer tools open to the "Network" tab and I've clicked on one of the HTTP requests to see the response from the API. image

It looks like the google_place_id can be used to look at a page focused on that specific location. One such page is at: https://www.jaccede.com/fr/p/ChIJm4f7BuCVuEwRipVgscyRFv8/0c6-chez-victor-st-paul-quebec?page=1

A simpler version of the URL that redirects to the above is at: https://www.jaccede.com/fr/p/ChIJm4f7BuCVuEwRipVgscyRFv8

Notice that the URL is formatted with the google_place_id at the end.

It looks like formatting the location search URL correctly isn't enough to download useful data because it responds with "Missing api key". https://api.jaccede.com/v4/places/search?boolean_filter=0&google=0&lang=fr&lat=46.8138783&lng=-71.2079809&page=1&per_page=40&total=1

In the developer tools, I see the following in the HTTP request headers sent to the location search API. X-Api-Key:c61b91136d79e38b127da5851f5895fe9dd40f5bb564f2ad38f8f7bc765be7fb The importer will likely need to use an API key like this too when writing the HTTP request to their site. You'd have to experiment a bit to see how long an API key is good for or if something else about the HTTP requests from the web browser is needed to get the intended data from the server.

Off those findings, you could write an importer that works by first building a large list of locations by making location search requests to their search api. That will give you useful fields for a CSV file like address, name, longitude, and latitude. "accessible_by_conviction" might be useful for the CSV too especially if that can be translated to rating question answers even if it is very roughly and erroneously.

You could get more details on each building by downloading the HTML for each location's page and pulling out things like each location's website URL and phone number. Each location's page can be determined off "https://www.jaccede.com/fr/p/" + the location's google_place_id.

Blandine-AA commented 7 years ago

Investigate Jaccede accessibility criteria list and see if we can improve our list, also look at the type of data they collect

joshi1983 commented 7 years ago

@kimficara, I'm ready to help you with this as soon as you are.

In the last Friday meeting, I suggested holding off on this a week or 2 in case another issue might come your way. That other issue was completed so there's no hold up anymore.

Let me know if you have any questions.

joshi1983 commented 7 years ago

@kimficara you started writing some Java code to write an HTTP request to jaccede's API in an attempt to get some JSON data that is visible in developer tools when you use the site. You ran into some problems where the server returned HTML with a French message that roughly translates to "Page not found" or "entity not found".

Let me know when you have Wireshark verifying that your HTTP request sends the X-Api-Key header. The next thing I'd suggest trying is just to copy most of the HTTP headers. The web browser gets the desired data from the server but the Java code doesn't so there must be some differences between the request sent from the Java application and what Safari is sending which causes one to succeed and one to fail. Minimizing the differences between the Java program's HTTP request and Safari's HTTP request should maximize the chance of achieving what we want. Remember to use HTTP instead of HTTPS if WireShark can't sniff your HTTPS requests.

We'll definitely find a way to get that information even though we ran into some unexpected challenges last night. If talking directly with the API doesn't work after some more experimentation, we could write a program that takes control of Safari and pulls the information that way. I don't want to give up on making the requests directly from your program just yet because that can lead to an importer that runs faster and more independently.

joshi1983 commented 7 years ago

I unassigned @kimficara because she has been busy with her new job and not on this project the last few weeks. Her last work on this was experimenting with making a Java application to interact directly with jaccede's API followed by some attempts to get Selenium working in Java for an application that would crawl jaccede. Due to a lot of problems with jar dependency compatibilities, it seemed like Python was the way to go. We just didn't delve much into that before the new job started.

This issue could be taken on by another person now.