Closed imalsogreg closed 6 years ago
Yes, that's certainly true. It's on my todo list, along with a lot of other things...
(The webapp that serves the database and interacts with the extension doesn't need the models. That's just the classifier task that runs on a cron. So the models absolutely don't need to be in this repo, but I haven't had a chance to excise them.)
@jeremybmerrill I'm happy to be assigned issues from the tracker, rather than putting them on your personal TODO list :) If I were to take the task on, I would open a WIP PR with a plan for where to host the files. But feel free to close the issue if it's not appropriate for an outsider.
Hey Greg --
This particular task is probably one that'd be better for me to do, since it's a question of how to get it integrated into our infrastructure. (And I have a workflow from another project to follow.)
If you're interested in picking up some tasks, I will write up some issues and tag you in teh comments.
I did this. It's all set up now. See https://github.com/propublica/facebook-political-ads/blob/master/backend/classifier/classifier/commands/get_models.py
FYI the initial clone is still 1.8 Gb due to the git history, but that only bothers new clones.
Hmm, damn. I thought I'd fixed that. I'll take another look at it. Thanks!
Fixed!
[greghale@p51:~/code/facebook-political-ads]$ du -h -d1
34M ./.git
4.2M ./extension
81M ./backend
119M .
The initial clone is 2.4G, due to the tracking of updated models in the same repo as the source code.
Would it be possible to host the models in another repository or s3, and fetch the most recent ones when appropriate (during build of a release, or at runtime? I'm not sure yet which is more appropriate for the project)