Open SCHKN opened 5 years ago
Both of these additions are great! To summarize:
I'm so glad that this was useful to you, and I'm greatly looking forward on these next developments! Let me know your preference based on your availability.
@SCHKN I want to double check that you saw the "How do I export data" section - you shouldn't need to save /push every file to some database because the changes are saved in git (and then exportable to a flat, temporal structure). https://vsoch.github.io/watchme/getting-started/index.html#how-do-i-export-data
I checked the export data section before the changes to see if they could be done in that part of the code.
Unfortunately, as the export function provides a "full" export (comparable to a "dump" in some ways), it didn't fit my needs as I needed data directly as they were recorded by the scraper.
To put things into context, I needed data to be exported directly as it was recorded to be directly visible in Grafana, I hope it makes sense.
If you are still interested in the two changes, I can open two PR's, let me know :)
Yes, I’d definitely be interested to see! I don’t want to waste your time so I’ll offer to take a look and PR if it’s generalizable enough.
Awesome! Sounds good to me.
Going to add some notes here as I work on this:
$ watchme push <watcher> <exporter>
This seems like a needed function, in case a push is desired without (or separate from) running the watcher. I think it would also be intuitive to allow the user to push "all exported" data, something like:
$ watchme push --all <watcher> <exporter>
That way we could have set up:
I can see some future need to push a subset of data, but for now (until someone asks for it) experiment with these ideas.
For the exporters, I'm thinking that we would want to have granularity to match exporters with watcher tasks. Currently, if we add an exporter, if it's active it is (implicitly) active for all tasks. We should be able to turn entire exporters on /off, but have the primary control coming directly from the tasks. For example, here we have two pushgateway exporters, one for each task (and you could imagine one task having more than one exporter).
[watcher]
active = false
[task-air-oakland]
url = http://aqicn.org/city/california/alameda/oakland-west
exporters = [exporter-pushgateway]
func = get_url_selection
selection = #aqiwgtvalue
file_name = oakland.txt
get_text = true
active = true
type = urls
[task-air-boulder]
url = http://aqicn.org/city/usa/colorado/boulder-cu/athens/
func = get_url_selection
selection = #aqiwgtvalue
file_name = boulder.txt
get_text = true
active = true
type = urls
[exporter-pushgateway]
url = localhost:9091
type = pushgateway
active = true
[exporter-another-pushgateway]
url = localhost:9091
type = pushgateway
active = true
@SCHKN could you point me in the right direction to set up the endpoint so I can test as I develop?
@vsoch,
I find the push
idea interesting, do you plan on adding it to the schedule function in order to push
data as it comes in?
Sure! To setup, a pushgateway, you can head over to https://github.com/prometheus/pushgateway
and read the Run it
section. Depending on your OS, it should be as easy as launching the binary and letting it run. I'm not sure that you actually need Prometheus on the other end to see the results.
Let me know if I can help.
@SCHKN yes - if a user has added an exporter and it's active and listed for a watcher task, it will run with schedule (this is how we achieve complete automation as you've done!). If the exporter is defined but not listed with any particular task, then it wouldn't be run with the scheduler. If the exporter is defined but not listed with a task and then manually requested with push, it would be run.
Thanks for the tips! I likely won't get this PR open today, but surely within the week. I'll keep you posted!
@SCHKN do you think it would be more intuitive to have two commands to add each of a watcher task and exporter, for example:
$ watchme add-task watcher task-cpu func@cpu_task type@psutils
$ watchme add-exporter watcher exporter-pushgateway
or have a single add command that determines the addition based on the prefix of what is being added? E.g.,
$ watchme add watcher task-cpu func@cpu_task type@psutils
$ watchme add watcher exporter-pushgateway
I've been implementing the second, but I'm thinking it might be cleaner to (for development down the line) have them as separate entrypoints.
When it comes to the actual usage of the application, I do believe that it is preferable to use the add-exporter
method.
Even if it adds an extra step, I find it less confusing and more explicit. What are your thoughts on it?
I totally agree! I'm glad you do too :)
Hello @vsoch!
Over the past few days, I used watchme quite a lot to perform web scraping tasks and I found the tool very handy for such tasks. To put some context, I needed a way to export data to external datasources, such as Prometheus (via Pushgateway) in this case.
I decided to develop a new layer on watchme in order to implement exporters. It could be used for example to export data to messaging queues or databases. With the recent development, you can now do :
watchme create weather-watcher --exporter pushgateway
This will create a
[exporter-pushgateway]
section in the watchme configuration, following templates that are specifically designed for exporters.Note : I am aware that there is already an export function, but I could not iterate on it, as I found that it was used to export all the content available in the repository.
I decided to export data in the
run
function of the task lifecycle.I also added the option to specify a regex when trying to perform scraping specifying an url selection. It looks like this :
This option goes very handy to target only numbers for web scraping.
I developed quite a lot of functions in order to enable exporters and regexes and all the modifications are available on my github on the repo named
watchme-prometheus
In the end, I was able to run scheduled tasks, exporting data every two seconds from a weather website and exporting data to Pushgateway : https://imgur.com/a/MJDuIUAI am curious to know if you would be interested by such modifications.
In any cases, I had a ton of fun developing this, and the way the app was built made it very easy to iterate.
Thank you!