Open gavanderhoorn opened 1 year ago
@DLu: does that also contain Q&A content? 170MB seems small for the entirety of ROS Answers?
Edit: looks like it does.
Sorta, the database structure is here: https://github.com/DLu/ros_metrics/blob/main/data/answers.yaml
The question title/summary is included.
The answer text is not.
The answer text is not.
ah, hm.
So that might still need scraping then.
Would you know of a way to retrieve the answer bodies as well, without scraping? This must exist right?
https://github.com/ASKBOT/askbot-devel/pull/828
Been there, done that.
Hello everyone,
I had no idea that this API existed, thank you so much @gavanderhoorn and @DLu!
@DLu I was wondering, I noticed in the database structure that it provides a summary of the question content and not the entire content of the question, and also the comments seem to be missing. Is it possible to also obtain this information using the API?
@DLu wrote:
@DLu: when was that .db
created/copied/downloaded? Trying some toy SQL queries and I can't get it to return the same nrs answers.ros.org
shows.
Either my SQL is crap incorrect (very much possible) or the .db
is not up-to-date?
@DLu I was wondering, I noticed in the database structure that it provides a summary of the question content and not the entire content of the question
I think the field is just named summary, but its actually the whole text. See https://answers.ros.org/api/v1/questions/408502/
and also the comments seem to be missing. Is it possible to also obtain this information using the API?
Last I checked, no
@DLu: when was that .db created/copied/downloaded? Trying some toy SQL queries and I can't get it to return the same nrs answers.ros.org shows.
I would have guessed the beginning of April. How off are the numbers you're getting?
Somewhat off-topic perhaps, but the following query (5184
is my user id):
select id from answers where user_id == 5184
returns 3479
for me. ROS Answers says (as of today) 3517
.
I also can't get the total karma to match what ROS Answers shows, but that's not really important.
My local copy says 3506
so it doesn't seem that off. I'll believe that you have 11 answers since I updated the database.
Hi guys. Interesting project.
I was curious as to why you're using web scraping to get ROS Answers content? IIRC, there is support for exporting/dumping the database (using a web API) in a relatively usable format. That would seem to allow more convenient processing of it.
The dump / API access was used by @DLu to create the ROS Answers section of metrics.ros.org (source).
Perhaps he could say something as to whether that could also be made available for scientific research purposes.