Closed thejordanprice closed 6 years ago
The project is very interesting, I was searching for something similar as I was interested in knowing more about how DHT works and the crawling stuff.
I am running the spider now on a Raspberry Pi 2 and it is working flawlessly (a bit slow but not a big problem). I am currently running 6 instances of the daemon and it is adding around 2000 magnets per hour which is kind of slow but expected of a raspberry pi.
Since I can't run multiple instances on my desktop as I am using windows at the moment and as I read in your instructions it is not possible for more than one instance to listen on one udp port (I might run the instances on different ports in the future on my windows and see how well it goes).
As for now I thank you very much for the time you have spent on this project and I will post again here if anything new occurs with my tests.
Thanks for the input. I'd try lowering the amount of daemons you're running; but maybe you've found a 'happy spot' for the Pi 2. I personally haven't attempted to run this on Pi's because it takes a pretty decent amount of power (your Pi may get pretty warm).
If you are running Windows 10, you could always go-to the 'Windows Store' and install Ubuntu or install some type of virtual machine software. If you encounter an issue, error, or want a new feature feel free to open a new issue.
Have fun and thanks for the positive outlook.
Hi again, and thanks for your response, First of all I am a C# developer and therefore, I have absolutely no idea if this is a good practice or not.
However, I did this just for fun which is updating the magnets counter on the index page with the new magnets count using Websockets(socket.io) and looping every 5 seconds checking if new magnets have been added to the database and broadcast a global message through the websocket to all the clients.
Finally, as I said earlier I am just a beginner when it comes to nodejs and webservers' stuff and I am very unsure if this thing that I did has some bad consequences and my code was a bit ugly but I think it was a good feature at least for me instead of refreshing the page.
Do a git pull and then add your websockets implementation again (back it up and restore it), then push it to your repo. If the code is decent or something I could refactor; I'll throw that feature in the master branch. Currently I don't have enough time to attempt that feature from scratch.
I've done a bit of code updating. This now requires a redis-server which has been thrown into the README.md, and should make it go twice as fast from the benchmarks I've been testing.
I'm very interested in what you've came up with and will help you with learning if I find bad practices.
Technically some would say the way I've skipped the whole MVC aspect is bad practice; but it all depends on who you are. Again I am trying to keep it as minimal as possible. So if it seems minimal, I can more than likely add that feature into the master.
Thank you for your response, the code can be found here.
I ended up using a ton of your code and then ended up at a point where I decided I could get the same result with a lot less code and without websockets. That feature has now been implemented in the master.
Thanks for the idea and work, I enjoy it. 😄
Very interesting implementation using ajax, I was interested in seeing things done with ajax but I went with Socket.io first because I read that socket.io is superior to ajax according to many online sources and specifically this comment :
WebSocket replaces HTTP for applications! It was designed by Google with the help of Microsoft and many other leading companies. All browsers support it. There are no cons. SocketIO is built on top of the WebSocket protocol (RFC 6455). It was designed to replace AJAX entirely. It does not have scalability issues what-so-ever. It works faster than AJAX while consuming an order of magnitude fewer resources. The main reasons for this is again, WebSocket was designed for applications, and AJAX is a work-around to enable applications on top of a document protocol. If you dive into the HTTP protocol, and use MVC frameworks, you'll see a single AJAX request will actually transmit 700-900 bytes of protocol load just to AJAX to a URL (without any of your own payload). In striking contrast, WebSocket uses about 10 bytes, or about 70x less data to talk with the server.
However, as I said earlier seeing an ajax approach is very interesting too. 😄
There is only one point that I didn't understand completely which was the error that you got in the cluster mode because I have been running the same code I linked in the comment above but with only one webserver instance and didn't encounter any errors, did the error show up while running more than one instance of the webserver ? because I never tried that.
Thank you once again. I am happy that my idea was useful. 😄
Yes, I considered many things when contemplating ajax vs sockets... Everything you said was true, but after watching the server load even while polling the count every 250ms, it was so minimal that ajax was fine.
Yes, I do always run in cluster mode, usually more than 10 daemons, which will consume about 70% of 4GB of ram and max out the CPU's pretty good. About 25Mbps down and 8Mbps up. It will consume a lot but I love the idea of having backups. 😅
Maybe in the future if more features requires api calls I would add sockets again, it just seemed like a lot of code to add for a tiny feature (considering the libraries code size as well).
Soon I will be adding pagination and lean() to the mongoose queries and other things to make pageloads much faster.
Any new feature ideas are always welcome. 👍
Thank you for your answer and looking forward for the new improvements, I am also running around 4 daemons but only needed one webserver instance.
However, the first problem with my raspberry pi 2 setup was that the processor is an x86 processor which limited the mongodb to only 2 GB for the database so I was forced to move the database to a remote host with an x64 processor which solved the problem 👍
Pagination has been implemented, let me know if you have any issues. The next real thing for me to tackle is search queries on a large database. I know it is slow at times. So that is my main thing for the future.
The daemon seems to be fine at the moment. I don't really see many problems with it. There is an occasional duplicate entry entered into the DB if running many instances at once; which could be fixed with a simple duplicate removing function that could run every once in a while.
The webserver and web-design on the other hand needs some work. If you're open to working on it, go ahead and edit away and I'll allow a pull request as long as the code remains stable. Any performance tweaks are completely encouraged as well. This was wrote as a spur of the moment type thing; and would love for it to grow more. :+1:
I'm working full time and doing side jobs constantly so I may not respond as fast as I would like, but I will keep watching for any activity on the project.