Set up new build/CI system

Daniel15 commented 1 year ago

Description

One of the conditions of GitHub reinstating this repo is that we can't use Actions any more. We need to find a new build/CI system.

We can look at hosted systems like CircleCI and Travis, or host our own Jenkins server.

Daniel15 commented 1 year ago

@BellezaEmporium is setting up a Jenkins server for this.

BellezaEmporium commented 1 year ago

@BellezaEmporium is setting up a Jenkins server for this.

I have a Jenkins CI running, created the IPTV index files successfully (i haven't recreated all of the actions we have), and the EPG is currently running but I expect it finishing in a whopping 10+ hours for 2-day-EPG for all of the data we have.

However it might not stay forever as i'm currently running on some DO credits, and it may be completely drained in the next months.

Daniel15 commented 1 year ago

I expect it finishing in a whopping 10+ hours

This is probably why Github disabled the repo lol We might need to optimize the code, for example parallelise more things.

We can probably use the Open Collective funds to pay for a server. There's a bunch of providers <$10/month that don't care too much about CPU usage.

BellezaEmporium commented 1 year ago

I expect it finishing in a whopping 10+ hours

This is probably why Github disabled the repo lol We might need to optimize the code.

We can probably use the Open Collective funds to pay for a server. There's a bunch of providers <$10/month that don't care too much about CPU usage.

I had built a tvtv.us scraper in Python that took a schedule in less than 2 seconds, so I believe we can optimize @freearhey 's epg-grabber. I need to look deeply through the code.

Daniel15 commented 1 year ago

We can also split it into several Jenkins jobs and run them across multiple servers. You can have multiple workers that connect to one Jenkins server. I may have some spare capacity I can use on my existing servers (I'd have to check).

freearhey commented 1 year ago

@Daniel15 To clarify, at the moment we only collect $6/month through Open Collective:

https://opencollective.com/iptv-org#category-BUDGET

So even if all @iptv-org/core members decide to donate their portion of the fund to pay for servers, in a year we will run out of money anyway.

zuleyhasultan commented 1 year ago

what is the situation please i'm still waiting admin and big masters :(

( Thanks )

boeder9 commented 1 year ago

what is the situation please i'm still waiting admin and big masters :(

( Thanks )

Everyone is waiting. There is no ETA, the new CI system will be done when it's done. For now, you need to be patient.

Daniel15 commented 1 year ago

So even if all @iptv-org/core members decide to donate their portion of the fund to pay for servers, in a year we will run out of money anyway.

If it's $6/month recurring, that should be enough for a server that can run a CI system (if the rest of the team are OK with that). We can also ask members if they'd be kind enough to do recurring monthly donations specifically to cover server expenses.

freearhey commented 1 year ago

$6/month recurring, that should be enough for a server that can run a CI system

Well, if it so. We only need to decide who will administer the server (will it be you or @BellezaEmporium). Then those of @iptv-org/core who want to support this initiative will just have to leave a comment here and I will update iptv-org/ledger to make it clear exactly how much money is available.

MapGuy11 commented 1 year ago

$6/month recurring, that should be enough for a server that can run a CI system

Well, if it so. We only need to decide who will administer the server (will it be you or @BellezaEmporium). Then those of @iptv-org/core who want to support this initiative will just have to leave a comment here and I will update iptv-org/ledger to make it clear exactly how much money is available.

I have multiple servers and am willing to host what we may need. Please let me know if that will be ok with all. If not then you can use my earnings as payment for one.

BellezaEmporium commented 1 year ago

I have tried to optimize the work time by modifying a few things in the grabber and made available 2 different versions : the "fast" version uses parallel threading linked to Promises, so everything launches in the same time. However, certain links may not be scraped as they're either not fullfilled or answering at time.

Another one, the "master / classic" version, is the same implementation with a little optimization test.

Both of them have updated dependencies.

In the latest testings (through CI) :

Fast version : 4 EPGs/minute - crashes in the middle of the work due to socket timeout issues. Could be due to parallel requests
Classic version : 0.7/0.6 EPGs / minute - goes quite well but is quite slow. May finish getting all the EPGs in about 2 days on my configuration.

I still wonder how the EPGs inside GH Actions used to be to perform that well without having any EPG issues.

Daniel15 commented 1 year ago

Fast version : 4 EPGs/minute - crashes in the middle of the work due to socket timeout issues. Could be due to parallel requests

Are these all in the same process? Something else you could try is splitting the work across multiple processes. Have a coordinator process that spins up several worker processes and use some form of IPC to communicate between them. The coordinator could split up the work and pass it to the workers. Doing that, plus using parallel requests, should speed it up a lot.

BellezaEmporium commented 1 year ago

Fast version : 4 EPGs/minute - crashes in the middle of the work due to socket timeout issues. Could be due to parallel requests

Are these all in the same process? Something else you could try is splitting the work across multiple processes. Have a coordinator process that spins up several worker processes and use some form of IPC to communicate between them. The coordinator could split up the work and pass it to the workers. Doing that, plus using parallel requests, should speed it up a lot.

Separate CI jobs

RevGear commented 1 year ago

I'm not that familiar with the build process so apologies if I'm talking rubbish here. From what I recall we had a number of processes to grab the and update a single database of programme details then once those had finished, a separate process to select the db and build the xml output. My experience of that sort of flat-file databases is that they are ok for smaller project but become a major resource hog when they get beyond a certain size.

We are no longer grouping outputs by country, every xml output file only contains programmes from a single source. So do we still that single huge programme db? Would the build be more efficient if each grabber process used it's own (smaller) database. Each xml output could then be created as part of that grabber process rather than building all xmls together at the end.

You still have the issue of building the guide list but that can be done using a smaller db that is updated by each of the grabber/build processes. That db only needs one record for each xml showing source, language, number of channels and URL - essential the tables we already have on the homepage. I'd probably add last update date/time to each row as I think that would be useful.

fachos commented 1 year ago

Hi @RevGear can you fix unifi.com.my config? as i remember my request been done by you.. The tvguide API still there but the header start with ENTER initializeClient[{"id":"20000009","name":" etc..... so the grabber cannot grab the EPG.. it return 0 program... in test database only start [{"id":"20000009","name":" no ENTER initializeClient.

BellezaEmporium commented 1 year ago

My CI now runs for specific EPGs, list being available in my fork, named "test.py".

If the others are ready to collaborate, you can take other sites so that our jobs do not replace already updated guides.

benbelgium commented 1 year ago

I can provide any machine you need with any specs

PopeyeTheSai10r commented 1 year ago

If the others are ready to collaborate, you can take other sites so that our jobs do not replace already updated guides.

Do we want each contributor to grab some sites, update them locally and then send a PR to the main Repo with the updated EPG XML file?

Daniel15 commented 1 year ago

If we're using Jenkins, we should keep everything in the same Jenkins instance. Multiple people can contribute servers as Jenkins allows multiple runners to be connected to a single installation. Ideally we should configure Jenkins to allow logins via GitHub, and configure it to allow any of the core team to log in.

eldepor commented 1 year ago

Some guides are not working even locally

https://github.com/iptv-org/epg/issues/2027#issuecomment-1479106985

BellezaEmporium commented 1 year ago

Some guides are not working even locally

https://github.com/iptv-org/epg/issues/2027#issuecomment-1479106985

Maybe because it needs some kind of fix in the config.

zuleyhasultan commented 1 year ago

Turkish

digiturk.com.tr,dsmart.com.tr,vplus.com.tr

please help me for this, the epg is still not working for 1 month, what is the latest situation ?

BellezaEmporium commented 1 year ago

Turkish

digiturk.com.tr,dsmart.com.tr,vplus.com.tr

please help me for this, the epg is still not working for 1 month, what is the latest situation ?

Not in my CI job for now.

benbelgium commented 1 year ago

@zuleyhasultan

digiturk.com.tr is working just fine:

dsmart.com.tr is working almost fine (some channels in the channels file are not on the site, but the others work):

vplus.com.tr was never even on this site?? I don't know where you got it??

Take Care

Ben

BellezaEmporium commented 1 year ago

Unfortunately this is taking alot longer than we would've hoped.

🥲

Both the EPG and iptv nodes are up and running on my side, though this won't be a long-term solution.

Will need to check up with @MapGuy11 and @Daniel15 for future verifications on how to parallelise our servers out.

I'm working on creating a Jenkinsfile for both nodes.

@freearhey once everything is up, we will need GH tokened authorization to push to both the epg and iptv repo.

freearhey commented 1 year ago

@Daniel15 also can generate tokens.

BellezaEmporium commented 1 year ago

Well, if @Daniel15 is OK with the idea, he can set up the master Jenkins CI node and i'll provide mine to the set. Maybe this will be easier for him to monitor all the pushes through the repos.

PopeyeTheSai10r commented 1 year ago

Any luck setting up those Jenkins nodes? Or getting @BellezaEmporium the authorization tokens?

BellezaEmporium commented 1 year ago

Any luck setting up those Jenkins nodes? Or getting @BellezaEmporium the authorization tokens?

So far I prefer setting my node as a slave one, letting someone who's better in using Jenkins set up a master node for this particular program.

seemebreakthis commented 1 year ago

Will be watching this space for any update

Dr3sdan commented 1 year ago

May be watching it for a long time then 😂

lmarceg commented 1 year ago

Is there a way to download the source code of the EPG grabber and run it locally? I don't need all those channels so I guess I could run a couple of JS parsers for the channels I need, while I wait for the system to come up again

BellezaEmporium commented 1 year ago

Is there a way to download the source code of the EPG grabber and run it locally? I don't need all those channels so I guess I could run a couple of JS parsers for the channels I need, while I wait for the system to come up again

https://github.com/freearhey/epg-grabber

BellezaEmporium commented 1 year ago

If the others are ready to collaborate, you can take other sites so that our jobs do not replace already updated guides.

Do we want each contributor to grab some sites, update them locally and then send a PR to the main Repo with the updated EPG XML file?

Seeing this a bit late, sorry.

Not that we need everyone to do so, but we need a few servers that are capable of grabbing the EPGs and giving the end result into the repo.

Dr3sdan commented 1 year ago

Goodness me this is taking longer than I hoped

xemles commented 1 year ago

Hey, Following the recommendation of @freearhey, here is my current implementation to update the EPG:

It's slightly altered because I actually store the EPG in a database for some data processing, but here's how I do it with the good old ✨PHP✨ and cloud-init: https://pastebin.com/raw/CQX4vpS5

Then I just run a cron job every day

My code is quite recent, I actually have been trying it since yesterday, so it doesn't circumvent every eventuality, and I have some unknown things I'm still trying to debug (such as when I put multiple clusters, it seems like it's putting the wrong programs associated with the wrong channel and lang, I don't know if it's in my parsing or my storing, but I haven't seen that issue with max_clusters=1) (there's even a slight chance it doesn't work as of right now, because i've just implemented the separate instances for separate Guides)

Maybe that'll help! :)

EDIT: Edited because it was, in fact, not working due to a wrong parsing of the file name LOL

BellezaEmporium commented 1 year ago

Sounds quite convincing, i'm seeing a webhook too. I suppose it's to send the guide after it's complete. It needs to stick up to the CI afterwards, sounds like your code would work in GitLab.

BellezaEmporium commented 1 year ago

So far on my side, it's working as expected. EPG (for a specified number of networks) is running fine, will still need to fix up some EPGs to make them work again.

I'll progressively add a few others when i'll have time.

freearhey commented 1 year ago

@iptv-org/core What if we reactivate GitHub Actions in the repository but with self-hosted runners?

GitHub Actions - Self-hosted runners - Installation & Calling

https://youtu.be/SASoUr9X0QA

If I understand correctly, in this case the load on the GitHub servers will be minimal, as it will only perform the load-balancer function. And if someone would like to support the project he can do it simply by installing this runner on his machine. Did I miss anything?

Or is it better to leave everything as it is?

https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners

Daniel15 commented 1 year ago

What if we reactivate GitHub Actions in the repository but with self-hosted runners?

@freearhey GitHub said to me that we can't do this. The same limitations apply for self-hosted runners, for whatever reason. This is the part of the terms of use that they flagged us for:

Additionally, regardless of whether an Action is using self-hosted runners, Actions should not be used for: any activity that places a burden on our servers, where that burden is disproportionate to the benefits provided to users (for example, don't use Actions as a content delivery network or as part of a serverless application

https://docs.github.com/en/site-policy/github-terms/github-terms-for-additional-products-and-features#actions

freearhey commented 1 year ago

@Daniel15 I see, then of course the question is off the table.

iptv-org / iptv-org.github.io

Set up new build/CI system #425

Description