How does ensembling works?

Now we implemented ensembling. But since I just copied and pasted mostly, it left me homework to better understand the ensembling method, at least from very high-level

How can stacking algorithm can make the result better? It would make sense to me, if each algorithm is better for a particular variable. Say, AdaBoost is better for PClass variable, or Random Forest is better for IsAlone feature. But, we don't use that at all. What I can see from the code, we just predict using each algorithm and then concat the result side by side and in the end predict it again using xgboost.
Why do we need to perform xgboost at the end? Does it have to be xgboost?
We choose certain algorithms to be in our ensemble list, however I think the choices are not limited. Here in wikipedia, it is mentioned to use Bayes Optimal Classifier, but we don't use that at all.
That raises another questions, how many algorithms can we perform or stack?

shivayogibeeradar / Titanic

How does ensembling works? #4