strohne / Facepager

Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping.
https://github.com/strohne/Facepager/releases
506 stars 198 forks source link

Facepager crashes when extracting a large number of nodes to CSV #64

Closed jolesn closed 7 years ago

jolesn commented 7 years ago

Hi, I am trying to extract around 100,000 nodes into a csv file. I tried going through an SQLite client and extract the db file into a CSV file but it comes out as giberish.

The comments are in Japanese and would really like to keep the emojis. When extracting around a thousand nodes and reading it an UTF-8 encoding it comes out nice and neat. Anyway I could achieve that straight from the db file?

strohne commented 7 years ago

Why don't you use the export feature of Facepager?

Data is UTF8 encoded JSON in the response column of the db file. You need to decode it after exporting with the SQLite client.

Emojis are always troublesome because of different encodings. Since not all clients support all unicode emojis, providers use customized implementations. You need to dig into the specific encoding of the platform you are working with, e.g. how Facebook deals with it.

What helped me with this issue:

jolesn commented 7 years ago

Thank you for your response. I did try the export feature of Facepager and it works perfectly, I can see all the emojis and a Japanese text, if possible I would want to use only that. However, it just seems to crash if there is too many nodes to export. Exporting just few thousand does not seem to be a problem, but anything over 10 thousand and it crashes.

I will try to extract it from the response column of the db file.

Thank you again.

strohne commented 7 years ago

exporting hundreds of thousand nodes never was a problem for me. make sure to have the newest version and select the fast export option.

jolesn commented 7 years ago

https://screencast.com/t/IHFhfv7Z https://screencast.com/t/BFDusPENrp11

Hey so this is what I get when I am trying to use the fast export option, it does not save the data at all. Any ideas?

I was able to extract the comments from the db like you told me but I would much rather be able to do that from the program.

Thank you for your help and reply.

dorvak commented 7 years ago

Could you provide your logfile (see the help to find where it`s located)?

jolesn commented 7 years ago

I apologize for the late reply dovrak. I am attaching the logfile you asked for, does that tell you anything?

Thank you for your help!

facepager fast
strohne commented 7 years ago

Yes, it does, thank you. Could you please do two things for me:

....and report the result of your tests?

I guess one of the nodes in your data is making trouble. Maybe you could find out which one?

dorvak commented 7 years ago

You might try to exclude some columns, especially those which contains integers, i.e. numbers. Maybe start with 2 columns and see what happens.

jolesn commented 7 years ago

@strohne @dorvak

I was able to export data thanks to your advice, thank you. The data is pretty dirty but I think I can clean it. Thank you so much!