medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
328 stars 59 forks source link

WE Statuses counts not refreshed timely [Was: In/Out Selections Not Registering on Frontend] #406

Closed mere13 closed 3 years ago

mere13 commented 3 years ago

Hello again.

Sorry to be a pain. I completely reset Docker and Hyphe and started with a whole new corpus (my old corpora are no longer accessible; lost all that data).

Now, when I use either Prospect or Web Entities to set entities to in/out/undecided, it doesn't register on the front end. In other words, the State of the Corpus still reads with my seed sites as in and everything else to discovered.

The backend is registering the changes, as when I export to csv they are there, which is great EXCEPT that I can't crawl the new sites I set to in because it's like they don't exist on the front end.

Any thoughts on how to fix this? I'm really up against a deadline now and am totally panicking. :)

EDIT: If I upgrade my Docker plan to a paid version to avoid auto updates in future and get access to support, do you know if that will impact the Hyphe currently running on it?

Thank you!

boogheta commented 3 years ago

Hello, I apologize but I don't understand the description of your problem: what are you calling the backend and the frontend ? The Prospect and WebEntities pages are both part of the frontend which is the whole web interface, the backend is the API which the frontend communicates with. Can you better describe what you are doing, what you are expecting in return and how it doesn't do that?

Regarding Docker updates I don't have much experience with docker paid plans so I don't know but I'd recommand while you work with a corpus to avoid messing too much with the install up until your corpus is complete yes.

mere13 commented 3 years ago

I can see it working in the terminal (backend) as I use the web interface (frontend). I have crawled my seed sites and it's given me 2035 entities. Now, I'm trying to prospect setting them in/out on the web interface but the web interface isn't showing, for example, the 200 some Twitter accounts I set to out as out. They're gone from discovered but also not in out. As I said, they are in the csv file set to out when I checked that.

I'll reach out to Docker. I didn't purposefully mess with it the first time. The free plan auto updates whether you like it or not and that's what broke the 90K link corpus I'd previously built.

boogheta commented 3 years ago

What do you mean by working in the terminal exactly, please describe more what you did exactly, which commands, how and such otherwise I have a really hard time understanding what the problem is edit: can you maybe copy paste the command lines you ran, and post screenshots of the interface pages where things are missing?

mere13 commented 3 years ago

Meaning, I can see the scroll moving in the terminal as I click around the web interface.

So, I have the discovered entities. In the web interface, I click on prospect and I can see the command pop up in the terminal. Then, I click an entity to set it to in or out and I can see the command pop up in the terminal. I'm not actually using commands in the terminal but only the web interface right now.

The only command lines I ran were the ones to boot up Hyphe:

cd hyphe

cp .env.example .env cp config-backend.env.example config-backend.env cp config-frontend.env.example config-frontend.env

docker-compose pull

docker compose up (the new Docker update won't use docker-compose up and instructs to use this one)

Then I logged into my corpus on the web interface at the local host.

The error I'm getting now is in the web interface. I have 2031 web entities.

Please see the screenshot. For example, it say I have 1 in entity but you can see from the list I have more than that. The same is happening for out and undecided. It isn't moving entities as a select where I want them to go.

Screen Shot 2021-04-14 at 9 19 26 AM
boogheta commented 3 years ago

OK this is much more clear now, thank you! This is quite strange indeed In your screenshot, the total of in/out/discovered does correcly sum up to 2031, so it looks to me like they are still all there but the problem is in updating their status or the count by status. Which total number of IN entities is correct? 1 or 16? Did you set all these in entities to a different status and the count is correct but not the list? Can you show a screenshot of the Overview page as well?

mere13 commented 3 years ago

Okay, great. Sorry for the initial confusion!

They are all still there for sure. They just aren't updating their status. In the above, the correct count for in was 16 not 1.

However, if I logout and reboot everything, it forces the update. For example, in the attached, you can see the counts have now updated.

Screen Shot 2021-04-14 at 11 20 25 AM
boogheta commented 3 years ago

OK so yes in that sense I guess it's just a minor bug of refreshing the cached totals for each status, but there is no actual problem in the data itself, just a temporary delay in updating these values (which should not happen and would deserve to be investigated why it happened) but you shouldn't worry too much I think. I was gonna suggest to stop and restart the corpus to get this fixed indeed.

mere13 commented 3 years ago

Okay, thank you!

mere13 commented 3 years ago

Wanted to send an update. The issue has gotten worse. I'm having to restart docker after each time I set a status now as it hangs up saying "error loading status." I'm attaching a log from when this happens. Hope it helps.

Thanks as always!

Screen Shot 2021-04-17 at 4 46 30 PM
boogheta commented 3 years ago

I don't think there is any link between the two issues. Here it looks like the MongoDB database is down, you will probably need to restart the dedicated container. but first can you copy paste the logs corresponding to the mongo_1 container ? You will probably need to scroll back up a bit.

mere13 commented 3 years ago

Sorry, I went to grab some dinner and closed out of everything before I left, but I just got home and started working again. This is the log from that (in two screenshots).

It seems like it's the Hyphe backend in Docker that is disconnecting?

Screen Shot 2021-04-17 at 7 15 44 PM Screen Shot 2021-04-17 at 7 16 02 PM
boogheta commented 3 years ago

Sorry to insist, but can you "copy paste"? screenshots are never a good idea whenever text is involved. And I need way more lines before since all these lines are identical and not saying anything else than the connection is closing.

mere13 commented 3 years ago

Sure. Here you go. This is basically everything from today.

Terminal Saved Output txt.zip

boogheta commented 3 years ago

Thanks, that gives way more context!

I can't find any explicit problem, but it looks to me like the server may suffer from a heavy charge? How much ram/cpu do you have on it? For what i understand, there are some difficulties with the mongo database around 5:30pm then the backend gets killed at 7:15pm: is this the result of you stopping the containers, doing ctrl+c or such?

mere13 commented 3 years ago

Yay, I'm glad that was more helpful. :)

You know, it's funny you should say that. I'm using a VPN and I did switch to a different one this time bc the our dedicated team one was at full capacity on other projects. I'll switch back to that one and see if it helps.

I didn't kill the backend at 7:15. I was off at dinner. It must have done that on its own.

boogheta commented 3 years ago

OK so if you didn't kill it yourself, I believe this comes from system's security measures, usually when a process is taking too much ram. So my inclination would be towards a performance issue on the machine running the containers.

mere13 commented 3 years ago

That makes sense. It is an older machine that's been wiped only for this kind of stuff. I guess a new machine is in order.

Thank you, again, for your help and for taking a look.