Open ebele-oputa opened 3 years ago
An API scraping tutorial was held last Wednesday
@ExperimentsInHonesty speak to Mark Webster to find out the feasibility of using the AWS report writer for the data from NC website survey
@ebele-oputa come to the data science meeting on Thursday at 8pm PT
Update: Accessing the API didn’t return the info required for the project, so now it’s looking like it will require web scraping using something called selenium. I believe Sophia is checking to find her docker container that already has the required environment set up on it to do so.
@ebele-oputa reach out to data team to see if data scraping workshop was successful
I added a comment to this issue asking Sophia for an update. https://github.com/hackforla/data-science/issues/44
Next steps: Check the issue on Data Science board on Thursday to see if they have updated. If not, please go to Data Science meeting and get status. Last time I talked to Sophia about this issue (3 weeks ago) she was going to do it over that weekend. If she promises same, then follow up with her on Saturday after noon.
Update: From Sophia - I ran a tutorial on scraping with Selenium and shared some starter code. They now have to parse the output into a usable format and save it as a file
Ebele's response: So, who owns the next task and is there an estimate of when it could be completed? I guess I would also want to know if there are any blockers
@ebele-oputa will follow up by Wednesday
@ebele-oputa attend data science meeting on Thursday. Ask Sophia or Ryan
@ebele-oputa Just messaged Ryan on the data science issue
@ebele-oputa reach out to new data project manager as introduced by Bonnie
I reached out to Abe on slack and here's his response -
"Hi Ebele, I’m still getting up to speed with everything, but I’m speaking with Bonnie tomorrow and will participate in the Data Science meeting on Thursday and introduce myself to that team so I can get these updates."
I spoke with Abe on Friday (Sept 10th). The task has been re-assigned to Rajinder. Need to read out for weekly updates
Update from Rajinder: "I got started, I managed to write a script to collect the stack for a single site at a time so far. I'll get some feed back from the DS team on Thursday, I'm sure they will have some pointers".
I have asked Rajinder for updates
Rajinder now has the tools and will execute the task
Update from Ranjinder: "I got the docker / selenium version working. I can scrape builtwith and output data to a json file. I am now going to try to get all websites from online source using selenium. I'll be at the data science meeting tonight to discuss.
I have now made a pull request with the webscraping for all websites. It includes a dockerfile and script that produces a json file, which I included."
@ebele-oputa recruit data analyst to work with Sonu ASAP
@ebele-oputa Reach out to Sophia Alice and Abe
Abe reached out to schedule a meeting to discuss the next steps.
@ebele-oputa @akhaleghi please update this issue with notes from our meeting
10-25-2021: Met with @ebele-oputa @jonarcisse @ExperimentsInHonesty and Rajinder. Additional action items:
Left a message on the DS issue https://github.com/hackforla/data-science/issues/44#issuecomment-969640494
@akhaleghi Can we get an update on this issue? Thanks
This needs to be checked on frequently so the Data Science team does not drop the ball.
Currently working with DS group through this issue https://github.com/hackforla/data-science/issues/44
Overview
We need the Data Science team to identify the technologies from each site in our NC comparative/competitive analysis so that we can publish the complete findings.
Action Items
Resources