ADDITIONALLY:
On comments.py, there is a section of code that begins with "if back". This part is checking whether or not it needs to iterate upwards to get the rest of the comments in the replies. However, for some comments, like
which IS the first link visited from the main comment page because FB displays a middle comment on the main page due to its popularity. In order to prevent missing out on scraping these entries,
you should change:
back = response.xpath('//div[contains(@id,"comment_replies_more_1")]/a/@href').extract()
to
back = response.xpath('//div[contains(@id,"comment_replies_more_2")]/a/@href').extract()
in order to get the algorithm to iterate forwards as well. After, you have to merge these two separately generated csv files. This ended up being the easiest solution for me, but it's definitely possible to be done within a single program
ADDITIONALLY: On comments.py, there is a section of code that begins with "if back". This part is checking whether or not it needs to iterate upwards to get the rest of the comments in the replies. However, for some comments, like
https://mbasic.facebook.com/comment/replies/?ctoken=10162169751605725_10162170377070725&p=129&count=168&pc=1&ft_ent_identifier=10162169751605725&gfid=AQBjT1xFFeGcZxyW&refid=52&__tn__=R
which IS the first link visited from the main comment page because FB displays a middle comment on the main page due to its popularity. In order to prevent missing out on scraping these entries, you should change:
back = response.xpath('//div[contains(@id,"comment_replies_more_1")]/a/@href').extract()
to
back = response.xpath('//div[contains(@id,"comment_replies_more_2")]/a/@href').extract()
in order to get the algorithm to iterate forwards as well. After, you have to merge these two separately generated csv files. This ended up being the easiest solution for me, but it's definitely possible to be done within a single program