Open rahulbot opened 4 years ago
We don't have good docs for how someone should deploy their own instance of our front-end apps. I started a
deploying.md
document here to capture some of what sprung to mind: https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/doc/deploying.md@esirK - this will be relevant for you and your team!
Thanks for this @rahulbot I will follow the docs and will keep you updated in case I face any troubles.
This helped me set up the Explorer and Source Manager. However, the functionality isnโt working and after some debugging, I think it's due to some missing tags
. I also found this https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/views/sources/__init__.py#L13 where we have some predefined tags_id which aren't in the database. My question therefore is where do these tag ids come from and I'm I supposed to manually create them?
Ugh - yeah, those hard-coded tags are all over our system :-( I'll open an issue to think about how to generalize that.
I was facing an issue when I tried deploying the app in dev mode on EC2 where the UI apps were not being loaded. I was only being able to see the following page
I traced the issue here https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/__init__.py#L120 the manifest.json
file had hardcoded values
{"assets":{"app_css.css":"app_css.16dcf6def49cd99c8346.css","app_css.js":"app_css.16dcf6def49cd99c8346.js","app_js.css":"app_js.16dcf6def49cd99c8346.css","app_js.js":"app_js.16dcf6def49cd99c8346.js","common_css.css":"common_css.16dcf6def49cd99c8346.css","common_css.js":"common_css.16dcf6def49cd99c8346.js"},"publicPath":"http://localhost:2992/"}
I therefore had to update the publicPath to the hostname of my ec2 instance. Is this something we should also add in the setup documentation?
@esirK -- wow, you are powering through this, and we'll weave this feedback into making making Media Cloud easier to deploy for others.
I'm a little surprised that you needed to change publicPath
though. When running in production, the flask should serve up the static compiled assets. These are created when running something like npm run topics-release
(the app changes based on which one you're building). Are you running the flask server in dev mode on your ec2 instance?
hey there @esirk What is your app.config SERVER_MODE
?
Yes @dsjen, @cindyloo right now I'm running in dev mode. I'll try running in production mode and give feedback.
Also, I was able to get the Explorer app to work by making the https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/views/explorer/__init__.py#L77
only_queries_reddit
to always return True which I would guess makes queries only to Reddit as a Media Source; So my question is what services should I have running in order to be able to run the Explorer on other Media Source?
I believe Explorer is mostly powered to the backend data sources via API and the reddit functionality is pretty limited. I don't believe it's integrated with other sources (e.g. Twitter). Do you have the media cloud backend up and running?
for running in dev mode, the way it works best (and for the moment until you get an update), is to create an empty manifest.json file in your MC/build directory that contains {"assets": {}, "publicPath": "./build"}
. This is because we use npm to generate the json file which flask_webpack uses to know where to live-compile the files into.
If you do that, all of these related issues running in dev should go away
@dsjen yes I do and I also added some collections manually but I'm not sure what I should expect e.g I have the Solr server running here http://ec2-52-18-167-8.eu-west-1.compute.amazonaws.com:8983/ but executing any query returns no data.
That's great progress! I'm working on that tag issue I mentioned earlier right now.
That reddit thing is a hack. You want it to return False
so that it always queries against your Media Cloud install.
Can you see if Solr has any data in it? Overall the system pulls runs jobs that check for sources that have RSS feeds associated, ingests and process those and stores them in Postgres. Then it grabs stories that aren't already in Solr and imports them from Postgres into Solr. Perhaps one of the links on that back-end chain isn't working?
@rahulbot Sure I'll revert the hack in order to use the Media Cloud install.
Right now Solr doesn't have any data. Theimport stories feedly
service shows that it's finding some stories e.g MediaWords.ImportStories.Feedly: _get_new_stories_from_feedly chunk: 34999 total stories found
but the apps_import-solr-data_1
service shows 0 stories
MediaWords.Solr.Dump: added 0 topic stories to the import
MediaWords.Solr.Dump: too few stories (0/1000). sleeping 60 seconds ..
If I remember correctly, we use feedly to back-fill older content where possible. The main app that regularly fetches RSS feeds from media sources is a different one. Perhaps take that question back to the back-end repo so they can help dig into why you're not getting stories into Solr yet? I'd imagine the process would be to add some sources, add some feeds, make sure the feed scraper is running, and then stories show up in the DB.
Sure will do that.
Hello.
I was able to do the front-end apps deployment successfully but seems like I need to login to each app independently. I was thinking that setting the COOKIE_DOMAIN
config to the domain I'm using should resolve this but it didn't. So my question is; How do I fix this such that I only need to log into just one app?
I'm so glad that you got the apps deployed! I think COOKIE_DOMAIN
is the correct config to set, so I'm not exactly sure what might be amiss. Our domain is .mediacloud.org
-- was a little surprised to see the .
at the beginning, so maybe that's key?
This might be a little bit of trial and error--sorry!
The first piece is that the session is stored in the external redis cache so it can be used across all the domains (via the SESSION_REDIS_URL
environment variable). The second, as you point out, is that the cookie domain needs to be a valid one so that the cookie which gets set works across any subdomain (via the COOKIE_DOMAIN
env var mentioned). I found that prepending with the period effectively wildcarded it for all subdomains (but I'm not a cookie expert).
Hello again.
I have one more query concerning the WORD_EMBEDDINGS_SERVER_URL
defined here https://github.com/mediacloud/web-tools/blob/main/config/app.config.template#L35
which word embedding server are you using or do we have a document concerning this?
@esirK -- you'll need to spin up an instance of https://github.com/mediacloud/word-embeddings-server. We can help if you have trouble with the set up. Good luck!
@esirK -- you'll need to spin up an instance of https://github.com/mediacloud/word-embeddings-server. We can help if you have trouble with the set up. Good luck!
Thank you for this @dsjen I was able to set this up but I had to change the versions of numpy and scipy to the following
numpy==1.17.0
scipy==0.18.1
without that, I was getting the following error
sklearn import error - ImportError: cannot import name 'comb'
which was originating from
from sklearn.decomposition import PCA
Oh, good to know! Please consider making a PR to update that for us! ๐
Oh, good to know! Please consider making a PR to update that for us! ๐
Sure I'll do this.
We don't have good docs for how someone should deploy their own instance of our front-end apps. I started a
deploying.md
document here to capture some of what sprung to mind: https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/doc/deploying.md@esirK - this will be relevant for you and your team!