strohne / Facepager

Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping.
https://github.com/strohne/Facepager/releases
506 stars 198 forks source link

Was "Convert to wide format" discontinued? #122

Closed alexgonca closed 3 years ago

alexgonca commented 4 years ago

Was the option to "convert to wide format" discontinued? I was a big fan. If you consider bringing it back at any moment, you have my enthusiastic support. In fact... you have my enthusiastic support anyway. :)

strohne commented 4 years ago

Yeah, me too. I removed it because importing pandas+numpy further blowed up the already huge binaries. But as a compensation, in the wiki you find an R script: https://github.com/strohne/Facepager/wiki/Data-Analysis#from-long-to-wide-format

Maybe we should add a Python script?

alexgonca commented 4 years ago

That's a good idea! I also wonder if it would be possible to replace pandas+numpy with just vanilla sqlite3 commands.

strohne commented 4 years ago

SQLite commands is a great idea. At the moment, this is not on my priority list. If you want to try it out or know someone who can, let me know. The method for "Export all" works with SQLAlchemy and could easily be tweaked: https://github.com/strohne/Facepager/blob/master/src/export.py#L115

(Btw: maybe I could raise some money if someone is interested in joining the development.)

strohne commented 4 years ago

Another idea: in the last days I was working on optimizing the iteration of the whole dataset inside of Facepager (prefetching records from SQLite). If that works out, the export method can be decoupled from direct access to the database, and parent data can be added from the internal cache at the export stage.

strohne commented 4 years ago

I enhanced the export function. When using the "selected nodes" option, the path of a node, including Object IDs of all parent nodes, is added as first column. Basically, I guess this is the most important data. What do you think, is it a sufficient replacement?

gentianbrija2010 commented 4 years ago

@strohne

Can you add some options as GTAS

for retrieving, seeing, finding person from

Firstnames & Lastnames

By city.name.airport.

from dataranger.... start .. to.... https://github.com/US-CBP/GTAS

strohne commented 4 years ago

@gentianbrija2010 Can you please open a new issue as this is a complete different question? To you question: I don't understand your request.., what kind of options, which API?

(Sorry, I will delete this comment in the next days because it is unrelated to the original question. Please open a new issue if you have development requests.)