hackforla / open-community-survey

GNU General Public License v2.0
4 stars 0 forks source link

NC data scraping task for web technologies #25

Open ebele-oputa opened 3 years ago

ebele-oputa commented 3 years ago

Overview

We need the Data Science team to identify the technologies from each site in our NC comparative/competitive analysis so that we can publish the complete findings.

Action Items

Resources

ebele-oputa commented 3 years ago

An API scraping tutorial was held last Wednesday

ebele-oputa commented 3 years ago

@ExperimentsInHonesty speak to Mark Webster to find out the feasibility of using the AWS report writer for the data from NC website survey

ebele-oputa commented 3 years ago

@ebele-oputa come to the data science meeting on Thursday at 8pm PT

ebele-oputa commented 3 years ago

Update: Accessing the API didn’t return the info required for the project, so now it’s looking like it will require web scraping using something called selenium. I believe Sophia is checking to find her docker container that already has the required environment set up on it to do so.

ebele-oputa commented 3 years ago

@ebele-oputa reach out to data team to see if data scraping workshop was successful

ExperimentsInHonesty commented 3 years ago

I added a comment to this issue asking Sophia for an update. https://github.com/hackforla/data-science/issues/44

Next steps: Check the issue on Data Science board on Thursday to see if they have updated. If not, please go to Data Science meeting and get status. Last time I talked to Sophia about this issue (3 weeks ago) she was going to do it over that weekend. If she promises same, then follow up with her on Saturday after noon.

ebele-oputa commented 3 years ago

Update: From Sophia - I ran a tutorial on scraping with Selenium and shared some starter code. They now have to parse the output into a usable format and save it as a file

Ebele's response: So, who owns the next task and is there an estimate of when it could be completed? I guess I would also want to know if there are any blockers

@ebele-oputa will follow up by Wednesday

ebele-oputa commented 3 years ago

@ebele-oputa attend data science meeting on Thursday. Ask Sophia or Ryan

ebele-oputa commented 3 years ago

@ebele-oputa Just messaged Ryan on the data science issue

ebele-oputa commented 3 years ago

@ebele-oputa reach out to new data project manager as introduced by Bonnie

ebele-oputa commented 3 years ago

I reached out to Abe on slack and here's his response -

"Hi Ebele, I’m still getting up to speed with everything, but I’m speaking with Bonnie tomorrow and will participate in the Data Science meeting on Thursday and introduce myself to that team so I can get these updates."

ebele-oputa commented 3 years ago

I spoke with Abe on Friday (Sept 10th). The task has been re-assigned to Rajinder. Need to read out for weekly updates

ebele-oputa commented 3 years ago

Update from Rajinder: "I got started, I managed to write a script to collect the stack for a single site at a time so far. I'll get some feed back from the DS team on Thursday, I'm sure they will have some pointers".

ebele-oputa commented 3 years ago

I have asked Rajinder for updates

ebele-oputa commented 3 years ago

Rajinder now has the tools and will execute the task

ebele-oputa commented 3 years ago

Update from Ranjinder: "I got the docker / selenium version working. I can scrape builtwith and output data to a json file. I am now going to try to get all websites from online source using selenium. I'll be at the data science meeting tonight to discuss.

I have now made a pull request with the webscraping for all websites. It includes a dockerfile and script that produces a json file, which I included."

ebele-oputa commented 3 years ago

@ebele-oputa recruit data analyst to work with Sonu ASAP

ebele-oputa commented 3 years ago

@ebele-oputa Reach out to Sophia Alice and Abe

ebele-oputa commented 3 years ago

Abe reached out to schedule a meeting to discuss the next steps.

ExperimentsInHonesty commented 3 years ago

@ebele-oputa @akhaleghi please update this issue with notes from our meeting

akhaleghi commented 3 years ago

10-25-2021: Met with @ebele-oputa @jonarcisse @ExperimentsInHonesty and Rajinder. Additional action items:

ExperimentsInHonesty commented 3 years ago

Left a message on the DS issue https://github.com/hackforla/data-science/issues/44#issuecomment-969640494

kalyaniraman commented 2 years ago

@akhaleghi Can we get an update on this issue? Thanks

kalyaniraman commented 2 years ago

This needs to be checked on frequently so the Data Science team does not drop the ball.

ExperimentsInHonesty commented 2 years ago

Currently working with DS group through this issue https://github.com/hackforla/data-science/issues/44